Audio augmentation for speech recognition
Citations Over TimeTop 1% of 2015 papers
Abstract
Data augmentation is a common strategy adopted to increase the quantity of training data, avoid overfitting and improve robustness of the models. In this paper, we investigate audio-level speech augmentation methods which directly process the raw signal. The method we particularly recommend is to change the speed of the audio signal, producing 3 versions of the original signal with speed factors of 0.9, 1.0 and 1.1. The proposed technique has a low implementation cost, making it easy to adopt. We present results on 4 different LVCSR tasks with training data ranging from 100 hours to 1000 hours, to examine the effectiveness of audio augmentation in a variety of data scenarios. An average relative improvement of 4.3% was observed across the 4 tasks.
Related Papers
- → Automatic lipreading to enhance speech recognition (speech reading)(1984)328 cited
- → Audio-Visual Speech Recognition Using MPEG-4 Compliant Visual Features(2002)58 cited
- → Audio-visual speech recognition with background music using single-channel source separation(2012)11 cited
- → Automatic speech recognition using audio visual cues(2005)13 cited
- → Enhancing quality and accuracy of speech recognition system by using multimodal audio-visual speech signal(2016)4 cited