Boosted audio-visual HMM for speech reading

Author

Yin, Pei ; Essa, Irfan ; Rehg, James M.

Author_Institution

GVU Center, Georgia Inst. of Technol., Atlanta, GA, USA

Volume

2

fYear

2003

fDate

9-12 Nov. 2003

Firstpage

2013

Abstract

We propose a new approach for combining acoustic and visual measurements to aid in recognizing lip shapes of a person speaking. Our method relies on computing the maximum likelihoods of (a) HMM used to model phonemes from the acoustic signal, and (b) HMM used to model visual features motions from video. One significant addition in this work is the dynamic analysis with features selected by AdaBoost, on the basis of their discriminant ability. This form of integration, leading to boosted HMM, permits AdaBoost to find the best features first, and then uses HMM to exploit dynamic information inherent in the signal.

Keywords

audio-visual systems; feature extraction; hidden Markov models; maximum likelihood estimation; speech processing; speech recognition; video signal processing; AdaBoost; acoustic measurement; acoustic signal; boosted audio-visual HMM; dynamic analysis; feature selection; hidden Markov model; lip shape recognition; maximum likelihood; phoneme model; speech reading; video signal; visual feature motion; visual measurement; Acoustic applications; Acoustic measurements; Educational institutions; Face detection; Hidden Markov models; Information analysis; Natural languages; Shape measurement; Signal analysis; Speech recognition;

fLanguage

English

Publisher

ieee

Conference_Titel

Signals, Systems and Computers, 2004. Conference Record of the Thirty-Seventh Asilomar Conference on

Print_ISBN

0-7803-8104-1

Type

conf

DOI

10.1109/ACSSC.2003.1292334

Filename

1292334