The RWTH/UPB/FORTH System Combination for the 4th CHiME Challenge Evaluation
Citations Over TimeTop 10% of 2016 papers
Abstract
This paper describes automatic speech recognition (ASR) systems developed jointly by RWTH, UPB and FORTH for the 1ch, 2ch and 6ch track of the 4th CHiME Challenge. In the 2ch and 6ch tracks the final system output is obtained by a Confusion Network Combination (CNC) of multiple systems. The Acoustic Model (AM) is a deep neural network based on Bidirectional Long Short-Term Memory (BLSTM) units. The systems differ by front ends and training sets used for the acoustic training. The model for the 1ch track is trained without any preprocessing. For each front end we trained and evaluated individual acoustic models. We compare the ASR performance of different beamforming approaches: a conventional superdirective beamformer [1] and an MVDR beamformer as in [2], where the steering vector is estimated based on [3]. Furthermore we evaluated a BLSTM supported Generalized Eigenvalue beamformer using NN-GEV [4]. The back end is implemented using RWTH?s open-source toolkits RASR [5], RETURNN [6] and rwthlm [7]. We rescore lattices with a Long Short-Term Memory (LSTM) based language model. The overall best results are obtained by a system combination that includes the lattices from the system of UPB?s submission [8]. Our final submission scored second in each of the three tracks of the 4th CHiME Challenge.
Related Papers
- → The third ‘CHiME’ speech separation and recognition challenge: Dataset, task and baselines(2015)657 cited
- → Neural network based spectral mask estimation for acoustic beamforming(2016)494 cited
- → An analysis of environment, microphone and data simulation mismatches in robust speech recognition(2016)365 cited
- → Improved MVDR Beamforming Using Single-Channel Mask Prediction Networks(2016)324 cited
- → Neural Network Adaptive Beamforming for Robust Multichannel Speech Recognition(2016)120 cited