Lip Reading using Simple Dynamic Features and a Novel ROI for Feature Extraction
Citations Over Time
Abstract
Deaf or hard-of-hearing people mostly rely on lip-reading to understand speech. They demonstrate the ability of humans to understand speech from visual cues only. Automatic lip reading systems work in a similar fashion - by obtaining speech or text from just the visual information, like a video of a person's face. In this paper, an automatic lip reading system for spoken digit recognition is presented. The system uses simple dynamic features by creating difference images between consecutive frames of the video input. Using this technique, word recognition rates of 83.79% and 65.58% are achieved in speaker-dependent and speaker-independent testing scenarios, respectively. A novel, extended region-of-interest (ROI) which includes lower jaw and neck region is also introduced. Most lip-reading algorithms use only the mouth/lip region for relevant feature extraction. Over simple mouth as the ROI, the proposed ROI improves the performance by 4% in speaker-dependent tests and by 11% in speaker-independent tests.
Related Papers
- → Feature Extraction Using Wavelet-PCA and Neural Network for Application of Object Classification & Face Recognition(2010)29 cited
- → ICA and PCA integrated feature extraction for classification(2016)32 cited
- → Partial discharge pattern recognition using multiscale feature extraction and support vector machine(2013)11 cited
- Image feature extraction from wavelet scalogram based on kernel principle component analysis(2009)
- → Feature Bias Correction: A Feature Augmentation Method for Long-tailed Recognition(2023)