Analyzing Uncertainties in Speech Recognition Using Dropout
Citations Over TimeTop 12% of 2019 papers
Abstract
The performance of Automatic Speech Recognition (ASR) systems is often measured using Word Error Rates (WER) which requires time-consuming and expensive manually transcribed data. In this paper, we use state-of-the-art ASR systems based on Deep Neural Networks (DNN) and propose a novel framework which uses "Dropout" at the test time to model uncertainty in prediction hypotheses. We systematically exploit this uncertainty to estimate WER without the need for explicit transcriptions. In addition, we show that the predictive uncertainty can also be used to accurately localize the errors made by the ASR system. We study the performance of our approach on Switchboard database where it predicts WER accurately within a range of 2.6% and 5.0% for HMM-DNN and Connectionist Temporal Classification (CTC) ASR systems, respectively.
Related Papers
- → Connectionist Models(2009)3 cited
- → HMM-GMM based Amazigh speech recognition system(2020)2 cited
- → Um Novo paradigma para a aprendizagem da linguagem: Inteligência Artificial Conexionista(2019)
- → Text Independent Speaker Verficiation Using Dominant State Information of HMM-UBM(2015)
- → Connectionist approaches to development(1996)