Beamnet: End-to-end training of a beamformer-supported multi-channel ASR system
Citations Over TimeTop 10% of 2017 papers
Abstract
This paper presents an end-to-end training approach for a beamformer-supported multi-channel ASR system. A neural network which estimates masks for a statistically optimum beamformer is jointly trained with a network for acoustic modeling. To update its parameters, we propagate the gradients from the acoustic model all the way through feature extraction and the complex valued beamforming operation. Besides avoiding a mismatch between the front-end and the back-end, this approach also eliminates the need for stereo data, i.e., the parallel availability of clean and noisy versions of the signals. Instead, it can be trained with real noisy multi-channel data only. Also, relying on the signal statistics for beamforming, the approach makes no assumptions on the configuration of the microphone array. We further observe a performance gain through joint training in terms of word error rate in an evaluation of the system on the CHiME 4 dataset.
Related Papers
- → Using Deep Gated RNN with a Convolutional Front End for End:to:End Classification of Heart Sound(2016)46 cited
- → BLSTM-CRF Based End-to-End Prosodic Boundary Prediction with Context Sensitive Embeddings in a Text-to-Speech Front-End(2018)21 cited
- → A Multi-Resolution Front-End for End-to-End Speech Anti-Spoofing(2022)1 cited
- → Front End to Back End Speech Scrambler(2019)1 cited
- → Learning the Front-End Speech Feature with Raw Waveform for End-to-End Speaker Recognition(2020)1 cited