Title :
Minimum phone error based stream weight training for mandarin audio-visual Speech recognition
Author :
Wu, Guanyong ; Zhu, Jie ; Xu, Haihua
Author_Institution :
Dept. of Electron. Eng., Shanghai Jiao Tong Univ., Shanghai, China
fDate :
June 28 2009-July 3 2009
Abstract :
Stream weight training is one of the key issues in the bimodal integration for the audio-visual speech recognition. In this paper, the audio- and video-only HMM classifiers are combined to recognize audio-visual speech recognition. More specifically, a discriminative training method is provided, in which the state-dependent stream weights are trained based on lattice rescoring by the minimum phone error using the extended Baum Welch algorithm. The proposed method is evaluated on our Mandarin large vocabulary audio-visual database. Experimental results show the proposed method has achieved significant error reduction than traditional global stream weight based approach and outperforms the minimum classification error based discriminative stream weight training method.
Keywords :
audio-visual systems; hidden Markov models; speech recognition; HMM classifier; Mandarin large vocabulary audio-visual database; bimodal integration; discriminative training method; extended Baum Welch algorithm; hidden Markov model; mandarin audio-visual speech recognition; minimum classification error; minimum phone error; state-dependent stream weight training method; stream weight training method; Audio databases; Automatic speech recognition; Hidden Markov models; Lattices; Lips; Maximum likelihood estimation; Speech recognition; Streaming media; Visual databases; Vocabulary; Audio-visual speech recognition (AVSR); Discriminative training; Minimum phone error (MPE);
Conference_Titel :
Multimedia and Expo, 2009. ICME 2009. IEEE International Conference on
Conference_Location :
New York, NY
Print_ISBN :
978-1-4244-4290-4
Electronic_ISBN :
1945-7871
DOI :
10.1109/ICME.2009.5202641