An audio-visual corpus for speech perception and automatic speech recognition
The Journal of the Acoustical Society of America2006Vol. 120(5), pp. 2421–2424
Citations Over TimeTop 10% of 2006 papers
Abstract
An audio-visual corpus has been collected to support the use of common material in speech perception and automatic speech recognition studies. The corpus consists of high-quality audio and video recordings of 1000 sentences spoken by each of 34 talkers. Sentences are simple, syntactically identical phrases such as "place green at B 4 now". Intelligibility tests using the audio signals suggest that the material is easily identifiable in quiet and low levels of stationary noise. The annotated corpus is available on the web for research use.
Related Papers
- → Improving acoustic models with captioned multimedia speech(2003)14 cited
- → Influence of Emotional Speech on Continuous Speech Recognition(2020)2 cited
- → Validation of Speech Data for Training Automatic Speech Recognition Systems(2022)2 cited
- → Constructing a speech audio–video corpus by aligning long segments of speech and text(2017)2 cited
- Recent Progress in Corpus-Based Spontaneous Speech Recognition(Feature Extraction and Acoustic Medelings, Corpus-Based Speech Technologies)(2005)