DocumentCode :
900494
Title :
Visual model structures and synchrony constraints for audio-visual speech recognition
Author :
Hazen, Timothy J.
Author_Institution :
Comput. Sci. & Artificial Intelligence Lab., Massachusetts Inst. of Technol., Cambridge, MA, USA
Volume :
14
Issue :
3
fYear :
2006
fDate :
5/1/2006 12:00:00 AM
Firstpage :
1082
Lastpage :
1089
Abstract :
This paper presents the design and evaluation of a speaker-independent audio-visual speech recognition (AVSR) system that utilizes a segment-based modeling strategy. The audio and visual feature streams are integrated using a segment-constrained hidden Markov model, which allows the visual classifier to process visual frames with a constrained amount of asynchrony relative to proposed acoustic segments. The core experiments in this paper investigate several different visual model structures, each of which provides a different means for defining the units of the visual classifier and the synchrony constraints between the audio and visual streams. Word recognition experiments are conducted on the AV-TIMIT corpus under variable additive noise conditions. Over varying acoustic signal-to-noise ratios, word error rate reductions between 14% and 60% are observed when integrating the visual information into the automatic speech recognition process.
Keywords :
audio-visual systems; hidden Markov models; speech recognition; acoustic signal-to-noise ratios; audio-visual speech recognition; segment-based modeling strategy; segment-constrained hidden Markov model; speaker-independent recognition; synchrony constraints; visual model structures; Additive noise; Automatic speech recognition; Ear; Error analysis; Hidden Markov models; Humans; Signal to noise ratio; Speech processing; Speech recognition; Streaming media; Audio-visual speech recognition; lip-reading; multimodal speech processing;
fLanguage :
English
Journal_Title :
Audio, Speech, and Language Processing, IEEE Transactions on
Publisher :
ieee
ISSN :
1558-7916
Type :
jour
DOI :
10.1109/TSA.2005.857572
Filename :
1621219
Link To Document :
بازگشت