Comparing computation in Gaussian mixture and neural network based large-vocabulary speech recognition
Citations Over TimeTop 21% of 2013 papers
Abstract
In this paper we look at real-time computing issues in large vocabulary speech recognition. We use the French broadcast audio transcription task from ETAPE 2011 for this evaluation. We compare word error rate (WER) versus overall computing time for hidden Markov models with Gaussian mixtures (GMM-HMM) and deep neural networks (DNN-HMM). We show that for a similar computing during recognition, the DNNHMM combination is superior to the GMM-HMM. For a realtime computing scenario, the error rate for the ETAPE dev set is 23.5% for DNN-HMM versus 27.9% for the GMM-HMM: a significant difference in accuracy for comparable computing. Rescoring lattices (generated by DNN-HMM acoustic model) with a quadgram language model (LM), and then with a neural net LM reduces the WER to 22.0% while still providing realtime computing.
Related Papers
- → Revisiting hybrid and GMM-HMM system combination techniques(2013)56 cited
- → Ground vehicle classification based on Hierarchical Hidden Markov Model and Gaussian Mixture Model using wireless sensor networks(2010)7 cited
- → HMM-GMM based Amazigh speech recognition system(2020)2 cited
- → Comparing computation in Gaussian mixture and neural network based large-vocabulary speech recognition(2013)2 cited
- → Switching GMM-HMM for Complex Human Activity Modeling and Recognition(2022)2 cited