Character-level incremental speech recognition with recurrent neural networks
Citations Over TimeTop 10% of 2016 papers
Abstract
In real-time speech recognition applications, the latency is an important issue. We have developed a character-level incremental speech recognition (ISR) system that responds quickly even during the speech, where the hypotheses are gradually improved while the speaking proceeds. The algorithm employs a speech-to-character unidirectional recurrent neural network (RNN), which is end-to-end trained with connectionist temporal classification (CTC), and an RNN-based character-level language model (LM). The output values of the CTC-trained RNN are character-level probabilities, which are processed by beam search decoding. The RNN LM augments the decoding by providing long-term dependency information. We propose tree-based online beam search with additional depth-pruning, which enables the system to process infinitely long input speech with low latency. This system not only responds quickly on speech but also can dictate out-of-vocabulary (OOV) words according to pronunciation. The proposed model achieves the word error rate (WER) of 8.90% on the Wall Street Journal (WSJ) Nov'92 20K evaluation set when trained on the WSJ SI-284 training set.
Related Papers
- → Capitalization Normalization for Language Modeling with an Accurate and Efficient Hierarchical RNN Model(2022)7 cited
- → Improved language modelling through better language model evaluation measures(2001)11 cited
- → Speech recognition experiments using multi-span statistical language models(1999)6 cited
- → Deep Learning Based Language Modeling for Domain-Specific Speech Recognition(2017)1 cited
- → Smoothed language model incorporation for efficient time-synchronous beam search decoding in LVCSR(2005)2 cited