Data augmentation, feature combination, and multilingual neural networks to improve ASR and KWS performance for low-resource languages
Citations Over TimeTop 10% of 2014 papers
Abstract
This paper presents the progress of acoustic models for lowresourced languages (Assamese, Bengali, Haitian Creole, Lao, Zulu) developed within the second evaluation campaign of the IARPA Babel project.This year, the main focus of the project is put on training high-performing automatic speech recognition (ASR) and keyword search (KWS) systems from language resources limited to about 10 hours of transcribed speech data.Optimizing the structure of Multilayer Perceptron (MLP) based feature extraction and switching from the sigmoid activation function to rectified linear units results in about 5% relative improvement over baseline MLP features.Further improvements are obtained when the MLPs are trained on multiple feature streams and by exploiting label preserving data augmentation techniques like vocal tract length perturbation.Systematic application of these methods allows to improve the unilingual systems by 4-6% absolute in WER and 0.064-0.105absolute in MTWV.Transfer and adaptation of multilingually trained MLPs lead to additional gains, clearly exceeding the project goal of 0.3 MTWV even when only the limited language pack of the target language is used.
Related Papers
- → Avoidance of Feature Interactions at Run-Time(2006)3 cited
- Artificial Neural Network (ANN) Method for Predicting of Hydrochemical Types of Salt Lakes(2005)
- → Avoidance of Feature Interactions at Run-Time(2006)2 cited
- Prediction Study of Farmer Income Based on the Artificial Neural Network(2005)
- Observed feature PlZPZ correspondkto model feature BOTTOM-SIDE In the model Observed feature P5-P6 corresponds to model feature TOP-FACE In the model Observed feature P4-P5 couesponds to model feature TOP-SIDE Observed feature P3-Pfi* corresponds to model feature SIDE-FACE-B In the model In the model(1987)