Low Latency Acoustic Modeling Using Temporal Convolution and LSTMs
Citations Over TimeTop 1% of 2017 papers
Abstract
Bidirectional long short-term memory (BLSTM) acoustic models provide a significant word error rate reduction compared to their unidirectional counterpart, as they model both the past and future temporal contexts. However, it is nontrivial to deploy bidirectional acoustic models for online speech recognition due to an increase in latency. In this letter, we propose the use of temporal convolution, in the form of time-delay neural network (TDNN) layers, along with unidirectional LSTM layers to limit the latency to 200 ms. This architecture has been shown to outperform the state-of-the-art low frame rate (LFR) BLSTM models. We further improve these LFR BLSTM acoustic models by operating them at higher frame rates at lower layers and show that the proposed model performs similar to these mixed frame rate BLSTMs. We present results on the Switchboard 300 h LVCSR task and the AMI LVCSR task, in the three microphone conditions.
Related Papers
- → Effects of Microphone Type on Acoustic Measures of Voice(2001)40 cited
- → The Effect of Reference Microphone Placement on Sound Pressure Levels at an Ear Level Hearing Aid Microphone(1990)10 cited
- Studies on Main and Room Microphone Optimization(2001)
- → Implementation of miniaturized microphone unit for implantable hearing aid and its characteristics(2005)
- Encapsulation,calibration and application of an economical measurement microphone(2010)