• DocumentCode
    2218876
  • Title

    Boosted audio-visual HMM for speech reading

  • Author

    Yin, Pei ; Essa, Irfan ; Rehg, James M.

  • Author_Institution
    Coll. of Comput., Georgia Inst. of Technol., Atlanta, GA, USA
  • fYear
    2003
  • fDate
    17 Oct. 2003
  • Firstpage
    68
  • Lastpage
    73
  • Abstract
    We propose a new approach for combining acoustic and visual measurements to aid in recognizing lip shapes of a person speaking. Our method relies on computing the maximum likelihoods of (a) HMM used to model phonemes from the acoustic signal, and (b) HMM used to model visual features motions from video. One significant addition is the dynamic analysis with features selected by AdaBoost, on the basis of their discriminant ability. This form of integration, leading to boosted HMM, permits AdaBoost to find the best features first, and then uses HMM to exploit dynamic information inherent in the signal.
  • Keywords
    acoustic signal processing; feature extraction; hidden Markov models; image motion analysis; maximum likelihood estimation; speech recognition; AdaBoost; HMM maximum likelihoods; acoustic signal measurement; boosted audio-visual HMM; dynamic information; lip shapes recognition; speech reading; visual features motion measurement; Acoustic applications; Acoustic measurements; Educational institutions; Face detection; Hidden Markov models; Maximum likelihood detection; Natural languages; Shape measurement; Signal analysis; Speech recognition;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Analysis and Modeling of Faces and Gestures, 2003. AMFG 2003. IEEE International Workshop on
  • Print_ISBN
    0-7695-2010-3
  • Type

    conf

  • DOI
    10.1109/AMFG.2003.1240826
  • Filename
    1240826