• DocumentCode
    900494
  • Title

    Visual model structures and synchrony constraints for audio-visual speech recognition

  • Author

    Hazen, Timothy J.

  • Author_Institution
    Comput. Sci. & Artificial Intelligence Lab., Massachusetts Inst. of Technol., Cambridge, MA, USA
  • Volume
    14
  • Issue
    3
  • fYear
    2006
  • fDate
    5/1/2006 12:00:00 AM
  • Firstpage
    1082
  • Lastpage
    1089
  • Abstract
    This paper presents the design and evaluation of a speaker-independent audio-visual speech recognition (AVSR) system that utilizes a segment-based modeling strategy. The audio and visual feature streams are integrated using a segment-constrained hidden Markov model, which allows the visual classifier to process visual frames with a constrained amount of asynchrony relative to proposed acoustic segments. The core experiments in this paper investigate several different visual model structures, each of which provides a different means for defining the units of the visual classifier and the synchrony constraints between the audio and visual streams. Word recognition experiments are conducted on the AV-TIMIT corpus under variable additive noise conditions. Over varying acoustic signal-to-noise ratios, word error rate reductions between 14% and 60% are observed when integrating the visual information into the automatic speech recognition process.
  • Keywords
    audio-visual systems; hidden Markov models; speech recognition; acoustic signal-to-noise ratios; audio-visual speech recognition; segment-based modeling strategy; segment-constrained hidden Markov model; speaker-independent recognition; synchrony constraints; visual model structures; Additive noise; Automatic speech recognition; Ear; Error analysis; Hidden Markov models; Humans; Signal to noise ratio; Speech processing; Speech recognition; Streaming media; Audio-visual speech recognition; lip-reading; multimodal speech processing;
  • fLanguage
    English
  • Journal_Title
    Audio, Speech, and Language Processing, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    1558-7916
  • Type

    jour

  • DOI
    10.1109/TSA.2005.857572
  • Filename
    1621219