The microsoft 2016 conversational speech recognition system
Citations Over TimeTop 1% of 2017 papers
Abstract
We describe Microsoft's conversational speech recognition system, in which we combine recent developments in neural-network-based acoustic and language modeling to advance the state of the art on the Switchboard recognition task. Inspired by machine learning ensemble techniques, the system uses a range of convolutional and recurrent neural networks. I-vector modeling and lattice-free MMI training provide significant gains for all acoustic model architectures. Language model rescoring with multiple forward and backward running RNNLMs, and word posterior-based system combination provide a 20% boost. The best single system uses a ResNet architecture acoustic model with RNNLM rescoring, and achieves a word error rate of 6.9% on the NIST 2000 Switchboard task. The combined system has an error rate of 6.2%, representing an improvement over previously reported results on this benchmark task.
Related Papers
- → Capitalization Normalization for Language Modeling with an Accurate and Efficient Hierarchical RNN Model(2022)7 cited
- → Improved language modelling through better language model evaluation measures(2001)11 cited
- → Speech recognition experiments using multi-span statistical language models(1999)6 cited
- → Deep Learning Based Language Modeling for Domain-Specific Speech Recognition(2017)1 cited
- → Initial decoding with minimally augmented language model for improved lattice rescoring in low resource ASR(2024)