A segment-based audio-visual speech recognizer
2004pp. 235–242
Citations Over TimeTop 10% of 2004 papers
Abstract
This paper presents the development and evaluation of a speaker-independent audio-visual speech recognition (AVSR) system that utilizes a segment-based modeling strategy. To support this research, we have collected a new video corpus, called Audio-Visual TIMIT (AV-TIMIT), which consists of 4 total hours of read speech collected from 223 different speakers. This new corpus was used to evaluate our new AVSR system which incorporates a novel audio-visual integration scheme using segment-constrained Hidden Markov Models (HMMs). Preliminary experiments have demonstrated improvements in phonetic recognition performance when incorporating visual information into the speech recognition process.
Related Papers
- → The effect of speech and audio compression on speech recognition performance(2002)36 cited
- → Improving acoustic models with captioned multimedia speech(2003)14 cited
- → A comparative study on phonological feature detection from continuous speech with respect to variable corpus size(2016)5 cited
- rre STC-TIMIT: Generation of a Single-channel Telephone Corpus.(2008)
- → TIMIT Acoustic-Phonetic Continuous Speech Corpus(2020)2,552 cited