Deep neural network based spectral feature mapping for robust speech recognition
Citations Over TimeTop 10% of 2015 papers
Abstract
Automatic speech recognition (ASR) systems suffer from performance degradation under noisy and reverberant conditions. In this work, we explore a deep neural network (DNN) based approach for spectral feature mapping from corrupted speech to clean speech. The DNN based mapping substantially reduces interference and produces estimated clean spectral features for ASR training and decoding. We experiment with several different feature mapping approaches and demonstrate that a DNN trained to predict clean log filterbank coefficients from noisy spectrogram directly can be extremely effective. The experiments show that the ASR systems with these cleaned features perform well under joint noisy and reverberant conditions, and achieve the state-of-the-art results on the CHiME-2 corpus with stereo (corrupted and clean) data.
Related Papers
- → SW-WAVENET: Learning Representation from Spectrogram and Wavegram Using Wavenet for Anomalous Sound Detection(2023)19 cited
- → Using the reassigned spectrogram to obtain a voiceprint(2006)
- → The preliminary application of Gabor spectrogram analysis in speech samples(1993)
- Estimation of Clean Spectrogram Noisy Value Functions Based on Metropolis Iterative Algorithm.(2013)
- → Anatomy of a Spectrogram(2024)