Performance of hybrid MMI-connectionist/HMM systems on the WSJ speech database
Abstract
A hybrid MMI-connectionist/hidden Markov model (HMM) speech recognition system for the Wall Street Journal (WSJ) database is presented. The HMM part of this system uses discrete probability density functions (PDF). The neural network (NN) is used to replace a classical vector quantizer (VQ) like a k-means or LBG algorithm, which are typically used in discrete HMM systems. The NN is trained on an algorithm, that tries to achieve maximum mutual information (MMI) between the generated output labels and the underlying phonetic description. The system has been trained and tested with the five thousand word speaker independent WSJ task. The error rates of the MMI-connectionist approach are 21% lower than the error rates of a k-means system. The system achieves error rates which have been achieved before only by the best continuous/semi-continuous HMM speech recognizers, with the advantage of a faster recognition algorithm.
Related Papers
- → HMM-GMM based Amazigh speech recognition system(2020)2 cited
- → Comparing computation in Gaussian mixture and neural network based large-vocabulary speech recognition(2013)2 cited
- → Performance of hybrid MMI-connectionist/HMM systems on the WSJ speech database(2002)1 cited
- → Text Independent Speaker Verficiation Using Dominant State Information of HMM-UBM(2015)
- → Connectionist approaches to development(1996)