Title :
Dynamic Stream Weight Modeling for Audio-Visual Speech Recognition
Author :
Marcheret, E. ; Libal, V. ; Potamianos, Gerasimos
Author_Institution :
IBM Thomas J. Watson Res. Center, Yorktown Heights, NY, USA
Abstract :
To generate optimal multi-stream audio-visual speech recognition performance, appropriate dynamic weighting of each modality is desired. In this paper, we propose to estimate such weights based on a combination of acoustic signal space observations and single-modality audio and visual speech model likelihoods. Two modeling approaches are investigated for such weight estimation: one based on a sigmoid fitting function, the other employing Gaussian mixture models. Reported experiments demonstrate that the later approach outperforms sigmoid based modeling, and is dramatically superior to the static weighting scheme.
Keywords :
Gaussian processes; audio-visual systems; speech processing; speech recognition; Gaussian mixture models; acoustic signal space observations; dynamic stream weight modeling; optimal multistream audio-visual speech recognition; sigmoid fitting function; single-modality audio; static weighting scheme; Automatic speech recognition; Fuses; Hidden Markov models; Linear discriminant analysis; Robustness; Speech processing; Speech recognition; Streaming media; Table lookup; Testing; Audio-Visual Speech Recognition; Multi-Modal Fusion; Multi-Stream HMM; Speech Processing;
Conference_Titel :
Acoustics, Speech and Signal Processing, 2007. ICASSP 2007. IEEE International Conference on
Conference_Location :
Honolulu, HI
Print_ISBN :
1-4244-0727-3
DOI :
10.1109/ICASSP.2007.367227