DocumentCode :
3209665
Title :
Asymmetrically boosted HMM for speech reading
Author :
Yin, Pei ; Essa, Irfan ; Rehg, James M.
Author_Institution :
GVU Center, Georgia Inst. of Technol., Atlanta, GA, USA
Volume :
2
fYear :
2004
fDate :
27 June-2 July 2004
Abstract :
Speech reading, also known as lip reading, is aimed at extracting visual cues of lip and facial movements to aid in recognition of speech. The main hurdle for speech reading is that visual measurements of lip and facial motion lack information-rich features like the Mel frequency cepstral coefficients (MFCC), widely used in acoustic speech recognition. These MFCC are used with hidden Markov models (HMM) in most speech recognition systems at present. Speech reading could greatly benefit from automatic selection and formation of informative features from measurements in the visual domain. These new features can then be used with HMM to capture the dynamics of lip movement and eventual recognition of lip shapes. Towards this end, we use AdaBoost methods for automatic visual feature formation. Specifically, we design an asymmetric variant of AdaBoost M2 algorithm to deal with the ill-posed multi-class sample distribution inherent in our problem. Our experiments show that the boosted HMM approach outperforms conventional AdaBoost and HMM classifiers. Our primary contributions are in the design of (a) boosted HMM and (b) asymmetric multi-class boosting.
Keywords :
cepstral analysis; hidden Markov models; speech recognition; AdaBoost methods; Mel frequency cepstral coefficients; acoustic speech recognition; asymmetrically boosted HMM; automatic visual feature formation; facial movements; hidden Markov models; ill-posed multiclass sample distribution; lip reading; speech reading; visual cues extraction; Acoustic measurements; Algorithm design and analysis; Boosting; Face recognition; Frequency measurement; Hidden Markov models; Mel frequency cepstral coefficient; Motion measurement; Shape; Speech recognition;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Computer Vision and Pattern Recognition, 2004. CVPR 2004. Proceedings of the 2004 IEEE Computer Society Conference on
ISSN :
1063-6919
Print_ISBN :
0-7695-2158-4
Type :
conf
DOI :
10.1109/CVPR.2004.1315240
Filename :
1315240
Link To Document :
بازگشت