DocumentCode :
2664185
Title :
Visual information assisted Mandarin large vocabulary continuous speech recognition
Author :
Liu, Peng ; Wang, Zuoying
Author_Institution :
Dept. of Electron. Eng., Tsinghua Univ., Beijing, China
fYear :
2003
fDate :
26-29 Oct. 2003
Firstpage :
72
Lastpage :
77
Abstract :
We present a general system framework of Mandarin audio-visual large vocabulary continuous speech recognition (LVCSR), which integrates visual information for better recognition performance and robustness. Several problems of audio-visual LVCSR are mainly addressed: lip tracking, visual feature extraction and audio-visual fusion. Firstly, the linear transform based lip tracking and low-level visual feature extraction methods are presented in comparison with the lip contour based feature extraction. Subsequently, the audio-visual fusion strategy based on multistream hidden Markov model (MSHMM) is investigated and a novel approach is presented for training global or state-dependent stream weights using minimum classification error (MCE) criterion. It is shown by experimental results that, with the visual information introduced, the word error rate (WER) of LVCSR system is reduced by 36.09% relatively in the case of clean audio, and the system robustness is also enhanced significantly in noise environment.
Keywords :
audio-visual systems; feature extraction; hidden Markov models; natural languages; speech enhancement; speech recognition; vocabulary; Mandarin large vocabulary continuous speech recognition; audio-visual fusion; audio-visual speech recognition; classification error; hidden Markov model; lip tracking; noise environment; visual feature extraction; visual information; Active shape model; Automatic speech recognition; Error analysis; Feature extraction; Hidden Markov models; Lips; Noise robustness; Speech recognition; Streaming media; Vocabulary;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Natural Language Processing and Knowledge Engineering, 2003. Proceedings. 2003 International Conference on
Conference_Location :
Beijing, China
Print_ISBN :
0-7803-7902-0
Type :
conf
DOI :
10.1109/NLPKE.2003.1275871
Filename :
1275871
Link To Document :
بازگشت