Exploring recognition network representations for efficient speech inference on highly parallel platforms
Citations Over TimeTop 10% of 2010 papers
Abstract
The emergence of highly parallel computing platforms is enabling new trade-offs in algorithm design for automatic speech recognition. It naturally motivates the following investigation: do the most computationally efficient sequential algorithms lead to the most computationally efficient parallel algorithms? In this paper we explore two contending recognition network representations for speech inference engines: the linear lexical model (LLM) and the weighted finite state transducer (WFST). We demonstrate that while an inference engine using the simpler LLM representation evaluates 22× more transitions per second than the advanced WFST representation, the simple structure of the LLM representation allows 4.7-6.4× faster evaluation and 53-65× faster operands gathering for each state transition. We use the 5k Wall Street Journal corpus to experiment on the NVIDIA GTX480 (Fermi) and the NVIDIA GTX285 Graphics Processing Units (GPUs), and illustrate that the performance of a speech inference engine based on the LLM representation is competitive with the WFST representation on highly parallel computing platforms.
Related Papers
- → DeepSpeed- Inference: Enabling Efficient Inference of Transformer Models at Unprecedented Scale(2022)208 cited
- → Age and the availability of inferences.(1992)216 cited
- → Elections and Representation in Local Government: A Victorian Case Study(2004)22 cited
- → POSITIVE-NEGATIVE ASYMMETRY IN MENTAL STATE INFERENCE: REPLICATION AND EXTENSION(2006)1 cited
- On the Probable Inference in Criminal Investigation(2004)