• DocumentCode
    2575146
  • Title

    Multistage information fusion for audio-visual speech recognition

  • Author

    Chu, S.M. ; Libal, V. ; Marcheret, E. ; Neti, C. ; Potamianos, Gerasimos

  • Author_Institution
    IBM T. J. Watson Res. Center, Yorktown Heights, NY, USA
  • Volume
    3
  • fYear
    2004
  • fDate
    27-30 June 2004
  • Firstpage
    1651
  • Abstract
    The paper looks into the information fusion problem in the context of audio-visual speech recognition. Existing approaches to audio-visual fusion typically address the problem in either the feature domain or the decision domain. We consider a hybrid approach that aims to take advantage of both the feature fusion and the decision fusion methodologies. We introduce a general formulation to facilitate information fusion at multiple stages, followed by an experimental study of a set of fusion schemes allowed by the framework. The proposed method is implemented on a real-time audio-visual speech recognition system, and evaluated on connected digit recognition tasks under varying acoustic conditions. The results show that the multistage fusion system consistently achieves lower word error rates than the reference feature fusion and decision fusion systems. It is further shown that removing the audio only channel from the multistage system leads to only minimal degradations in recognition performance while providing a noticeable reduction in computational load.
  • Keywords
    audio signal processing; error statistics; sensor fusion; speech recognition; video signal processing; audio-visual speech recognition; connected digit recognition tasks; decision fusion methods; feature fusion methods; multistage fusion system; multistage information fusion; word error rates; Degradation; Error analysis; Humans; Performance gain; Speech recognition; Streaming media;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Multimedia and Expo, 2004. ICME '04. 2004 IEEE International Conference on
  • Print_ISBN
    0-7803-8603-5
  • Type

    conf

  • DOI
    10.1109/ICME.2004.1394568
  • Filename
    1394568