Title :
Speaker independent audio-visual speech recognition
Author :
Zhang, You ; Levinson, Stephen ; Huang, Thomas
Author_Institution :
Beckman Inst. for Adv. Sci. & Technol., Illinois Univ., Urbana, IL, USA
Abstract :
We present a general framework of integrating multimodal sensory signals for spatial temporal pattern recognition. Statistical methods are used to model time varying events in a collaborative manner such that the inter-modal CO-occurrence are taken into account. We discuss various data fusion strategies, modeling of the inter-modal correlations and extracting statistical parameters for multi-modal models. A bimodal speech recognition system is implemented. A speaker-independent experiment is carried out to test the audio-visual speech recognizer under different kinds of noises from a noise database. Consistent improvements of word recognition accuracy (WRA) are achieved using a cross-validation scheme over different signal-to-noise ratios
Keywords :
audio-visual systems; multimedia systems; pattern recognition; sensor fusion; speech recognition; audio-visual speech recognizer; bimodal speech recognition system; cross-validation scheme; data fusion strategies; inter-modal CO-occurrence; inter-modal correlations; multi-modal models; multimodal sensory signals; noise database; signal-to-noise ratios; spatial temporal pattern recognition; speaker independent audio-visual speech recognition; speaker-independent experiment; statistical methods; statistical parameters; time varying events; word recognition accuracy; Acoustic noise; Automatic speech recognition; Decoding; Hidden Markov models; Humans; Mouth; Spatial resolution; Speech analysis; Speech enhancement; Speech recognition;
Conference_Titel :
Multimedia and Expo, 2000. ICME 2000. 2000 IEEE International Conference on
Conference_Location :
New York, NY
Print_ISBN :
0-7803-6536-4
DOI :
10.1109/ICME.2000.871546