DocumentCode
3249489
Title
Dynamic visual features based on discriminative speech class projection for visual speech recognition
Author
Lei, Xie ; Xiu-Li, Cai ; Zhong-Hua, Fu ; Rong-Chun, Zhao
Author_Institution
Sch. of Comput. Sci., Northwestern Polytech. Univ., Xi´´an, China
fYear
2004
fDate
20-22 Oct. 2004
Firstpage
687
Lastpage
690
Abstract
This paper presents a dynamic visual feature extraction scheme to capture important lip motion information for visual speech recognition. Discriminative projections based on a-priori chosen speech classes, phonemes and visemes, are applied to the concatenation of pre-extracted static visual features. First- and second-order temporal derivatives are subsequently extracted to further represent the dynamic differences. Experiments on a connected digits task demonstrate that the proposed high discriminative dynamic features, when augmented to the static, yields superior recognition performance. Compared to the commonly used delta and acceleration features, the proposed dynamic feature leads to an 8% absolute improvement in terms of word accuracy for the considered recognition task.
Keywords
feature extraction; hidden Markov models; image sequences; speech recognition; MPEG-1 video; concatenated pre-extracted static visual features; discriminative dynamic features; discriminative speech class projection; dynamic visual feature extraction; linear discriminant analysis; lip motion information; mouth image sequences; phonemes; temporal derivatives; visemes; visual speech recognition; word accuracy; Acoustic noise; Auditory system; Automatic speech recognition; Data mining; Feature extraction; Hidden Markov models; Humans; Noise robustness; Speech processing; Speech recognition;
fLanguage
English
Publisher
ieee
Conference_Titel
Intelligent Multimedia, Video and Speech Processing, 2004. Proceedings of 2004 International Symposium on
Print_ISBN
0-7803-8687-6
Type
conf
DOI
10.1109/ISIMP.2004.1434157
Filename
1434157
Link To Document