Title :
Improved ROI and within frame discriminant features for lipreading
Author :
Potamianos, Gerasimos ; Neti, Chalapathy
Author_Institution :
IBM Thomas J. Watson Res. Center, Yorktown Heights, NY, USA
fDate :
6/23/1905 12:00:00 AM
Abstract :
We study three aspects of designing appearance based visual features for automatic lipreading: (a) the choice of the video region of interest (ROI) on which image transform features are obtained; (b) the extraction of speech discriminant features at each frame; (c) the use of temporal information to improve visual speech modeling. With respect to (a), we propose a ROI that includes the speaker´s jaw and cheeks, in addition to the traditionally used mouth/lip region. With respect to (b) and (c), we propose the use of a two-stage linear discriminant analysis, both within a single frame and across a large number of frames. On a large-vocabulary, continuous-speech, audio-visual database, the proposed visual features result in a 13% absolute reduction in visual-only word error rate over a baseline visual front end, and in an additional 28% relative improvement in audio-visual over audio-only phonetic classification accuracy
Keywords :
discrete cosine transforms; feature extraction; image recognition; image sequences; speech recognition; audio-visual database; automatic lipreading; automatic speech recognition; continuous speech database; discrete cosine transform; discriminant features; large vocabulary database; linear discriminant analysis; speech discriminant feature extraction; temporal information; video region of interest; visual speech modeling; Algorithm design and analysis; Automatic speech recognition; Discrete cosine transforms; Discrete wavelet transforms; Feature extraction; Linear discriminant analysis; Mouth; Shape; Speech recognition; Vocabulary;
Conference_Titel :
Image Processing, 2001. Proceedings. 2001 International Conference on
Conference_Location :
Thessaloniki
Print_ISBN :
0-7803-6725-1
DOI :
10.1109/ICIP.2001.958098