• DocumentCode
    1749631
  • Title

    Weighting schemes for audio-visual fusion in speech recognition

  • Author

    Glotin, Hervé ; Vergyr, D. ; Neti, Chalapathy ; Potamianos, Gerasimos ; Luettin, Juergen

  • Author_Institution
    ICP, Grenoble, France
  • Volume
    1
  • fYear
    2001
  • fDate
    2001
  • Firstpage
    173
  • Abstract
    We demonstrate an improvement in the state-of-the-art large vocabulary continuous speech recognition (LVCSR) performance, under clean and noisy conditions, by the use of visual information, in addition to the traditional audio one. We take a decision fusion approach for the audio-visual information, where the single-modality (audio- and visual- only) HMM classifiers are combined to recognize audio-visual speech. More specifically, we tackle the problem of estimating the appropriate combination weights for each of the modalities. Two different techniques are described: the first uses an automatically extracted estimate of the audio stream reliability in order to modify the weights for each modality (both clean and noisy audio results are reported), while the second is a discriminative model combination approach where weights on pre-defined model classes are optimized to minimize WER (clean audio only results)
  • Keywords
    Gaussian distribution; audio signal processing; decision theory; hidden Markov models; sensor fusion; speech recognition; video signal processing; audio stream; audio-visual fusion; clean conditions; decision fusion approach; discriminative model combination approach; large vocabulary continuous speech recognition; noisy conditions; single-modality HMM classifiers; visual information; weighting schemes; Acoustic noise; Art; Audio databases; Automatic speech recognition; Hidden Markov models; Signal to noise ratio; Speech enhancement; Speech recognition; Streaming media; Vocabulary;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Acoustics, Speech, and Signal Processing, 2001. Proceedings. (ICASSP '01). 2001 IEEE International Conference on
  • Conference_Location
    Salt Lake City, UT
  • ISSN
    1520-6149
  • Print_ISBN
    0-7803-7041-4
  • Type

    conf

  • DOI
    10.1109/ICASSP.2001.940795
  • Filename
    940795