• DocumentCode
    2892107
  • Title

    Boosted audio-visual HMM for speech reading

  • Author

    Yin, Pei ; Essa, Irfan ; Rehg, James M.

  • Author_Institution
    GVU Center, Georgia Inst. of Technol., Atlanta, GA, USA
  • Volume
    2
  • fYear
    2003
  • fDate
    9-12 Nov. 2003
  • Firstpage
    2013
  • Abstract
    We propose a new approach for combining acoustic and visual measurements to aid in recognizing lip shapes of a person speaking. Our method relies on computing the maximum likelihoods of (a) HMM used to model phonemes from the acoustic signal, and (b) HMM used to model visual features motions from video. One significant addition in this work is the dynamic analysis with features selected by AdaBoost, on the basis of their discriminant ability. This form of integration, leading to boosted HMM, permits AdaBoost to find the best features first, and then uses HMM to exploit dynamic information inherent in the signal.
  • Keywords
    audio-visual systems; feature extraction; hidden Markov models; maximum likelihood estimation; speech processing; speech recognition; video signal processing; AdaBoost; acoustic measurement; acoustic signal; boosted audio-visual HMM; dynamic analysis; feature selection; hidden Markov model; lip shape recognition; maximum likelihood; phoneme model; speech reading; video signal; visual feature motion; visual measurement; Acoustic applications; Acoustic measurements; Educational institutions; Face detection; Hidden Markov models; Information analysis; Natural languages; Shape measurement; Signal analysis; Speech recognition;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Signals, Systems and Computers, 2004. Conference Record of the Thirty-Seventh Asilomar Conference on
  • Print_ISBN
    0-7803-8104-1
  • Type

    conf

  • DOI
    10.1109/ACSSC.2003.1292334
  • Filename
    1292334