0 citations0 references

A segment-based audio-visual speech recognizer

2004pp. 235–242

Citations Over TimeTop 10% of 2004 papers

Timothy J. Hazen, Kate Saenko, Chia-Hao La, James Glass

Abstract

This paper presents the development and evaluation of a speaker-independent audio-visual speech recognition (AVSR) system that utilizes a segment-based modeling strategy. To support this research, we have collected a new video corpus, called Audio-Visual TIMIT (AV-TIMIT), which consists of 4 total hours of read speech collected from 223 different speakers. This new corpus was used to evaluate our new AVSR system which incorporates a novel audio-visual integration scheme using segment-constrained Hidden Markov Models (HMMs). Preliminary experiments have demonstrated improvements in phonetic recognition performance when incorporating visual information into the speech recognition process.

Related Papers

→ The effect of speech and audio compression on speech recognition performance(2002)36 cited
→ Improving acoustic models with captioned multimedia speech(2003)14 cited
→ A comparative study on phonological feature detection from continuous speech with respect to variable corpus size(2016)5 cited
rre STC-TIMIT: Generation of a Single-channel Telephone Corpus.(2008)
→ TIMIT Acoustic-Phonetic Continuous Speech Corpus(2020)2,552 cited