DocumentCode :
2163831
Title :
Audio-visual speech recognition by speechreading
Author :
Zhang, Xiaozheng ; Mersereau, Russell M. ; Clements, Mark A.
Author_Institution :
Sch. of Electr. & Comput. Eng., Georgia Inst. of Technol., Atlanta, GA, USA
Volume :
2
fYear :
2002
fDate :
2002
Firstpage :
1069
Abstract :
Speechreading increases intelligibility in human speech perception. This suggests that conventional acoustic-based speech processing can benefit from the addition of visual information. This paper exploits speechreading for joint audio-visual speech recognition. We first present a color-based feature extraction algorithm that is able to extract salient visual speech features reliably from a frontal view of the talker in a video sequence. Then, a new fusion strategy using a coupled hidden Markov model (CHMM) is proposed to incorporate visual modality into the acoustic subsystem. By maintaining temporal coupling across the two modalities at the feature level and allowing asynchrony in the state at the same time, a CHMM provides a better model for capturing temporal correlations between the two streams of information. The experimental results demonstrate that the combined audio-visual system outperforms the acoustic-only recognizer over a wide range of noise levels.
Keywords :
feature extraction; gesture recognition; hidden Markov models; image colour analysis; image sequences; speech recognition; video signal processing; audio-visual speech recognition; coupled hidden Markov model; feature extraction; human speech perception; speechreading; temporal correlations; video sequence; visual information; visual modality; visual speech features; Audio-visual systems; Data mining; Feature extraction; Hidden Markov models; Humans; Maintenance; Speech processing; Speech recognition; Streaming media; Video sequences;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Digital Signal Processing, 2002. DSP 2002. 2002 14th International Conference on
Print_ISBN :
0-7803-7503-3
Type :
conf
DOI :
10.1109/ICDSP.2002.1028275
Filename :
1028275
Link To Document :
بازگشت