Dynamic Frame Skipping for Fast Speech Recognition in Recurrent Neural Network Based Acoustic Models
Citations Over TimeTop 16% of 2018 papers
Abstract
A recurrent neural network is a powerful tool for modeling sequential data such as text and speech. While recurrent neural networks have achieved record-breaking results in speech recognition, one remaining challenge is their slow processing speed. The main cause comes from the nature of recurrent neural networks that read only one frame at each time step. Therefore, reducing the number of reads is an effective approach to reducing processing time. In this paper, we propose a novel recurrent neural network architecture called Skip-RNN, which dynamically skips speech frames that are less important. The Skip-RNN consists of an acoustic model network and skip-policy network that are jointly trained to classify speech frames and determine how many frames to skip. We evaluate our proposed approach on the Wall Street Journal corpus and show that it can accelerate acoustic model computation by up to 2.4 times without any noticeable degradation in transcription accuracy.
Related Papers
- → Language Identification in Short Utterances Using Long Short-Term Memory (LSTM) Recurrent Neural Networks(2016)146 cited
- → Bidirectional recurrent neural network language models for automatic speech recognition(2015)83 cited
- → Second Order Diagonal Recurrent Neural Network(2007)18 cited
- → A new recurrent neural network architecture for pattern recognition(1996)8 cited
- → Language Models with RNNs for Rescoring Hypotheses of Russian ASR(2016)3 cited