Title :
Audio-Visual Speech Fusion Using Coupled Hidden Markov Models
Author :
Chu, Stephen M. ; Huang, Thomas S.
Author_Institution :
IBM T. J. Watson Res. Center, Yorktown Heights
Abstract :
The fusion of audio and visual speech is an instance of the general sensory fusion problem. The sensory fusion problem arises in the situation when multiple channels carry complementary information about different components of a system. In the case of audio-visual speech, the two modalities manifest two aspects of the same underlying speech production process. From an observer´s view, the audio channel and the visual channel represent two interacting stochastic processes. We seek a framework that can model the two individual processes as well as their dynamic interactions. One interesting aspect of audio-visual speech is the inherent asynchrony between the audio and visual channels. Most early integration approaches to the fusion problem assume tight synchrony between the two. However, studies have shown that human perception of bimodal speech does not require rigid synchronization of the two modalities. Furthermore, humans appear to use the audio-visual asynchronies as multimodal features. For example, it is well known that the voice onset time is an important cue to the voicing feature in stop consonants. This information can be conveyed bimodally by the interval between seeing the stop release and hearing the vocal cord vibration. Therefore, a successful fusion scheme should not only be tolerant to asynchrony between the audio and visual cues, but also be apt to capture and exploit this bimodal feature.
Keywords :
audio-visual systems; hidden Markov models; sensor fusion; speech recognition; audio channel; audio-visual speech fusion; bimodal speech recognition system; hidden Markov model; sensory fusion; stochastic process; visual channel; Approximation methods; Auditory system; Bayesian methods; Fuses; Hidden Markov models; Humans; Joining processes; Nearest neighbor searches; Speech processing; Stochastic processes;
Conference_Titel :
Computer Vision and Pattern Recognition, 2007. CVPR '07. IEEE Conference on
Conference_Location :
Minneapolis, MN
Print_ISBN :
1-4244-1179-3
Electronic_ISBN :
1063-6919
DOI :
10.1109/CVPR.2007.383524