DocumentCode
1858210
Title
Discriminative training of HMM stream exponents for audio-visual speech recognition
Author
Potamianos, Gerasimos ; Graf, Hans Peter
Author_Institution
AT&T Labs., Florham Park, NJ, USA
Volume
6
fYear
1998
fDate
12-15 May 1998
Firstpage
3733
Abstract
We propose the use of discriminative training by means of the generalized probabilistic descent (GPB) algorithm to estimate hidden Markov model (HMM) stream exponents for audio-visual speech recognition. Synchronized audio and visual features are used to respectively train audio-only and visual-only single-stream HMMs of identical topology by maximum likelihood. A two-stream HMM is then obtained by combining the two single-stream HMMs and introducing exponents that weigh the log-likelihood of each stream. We present the GPD algorithm for stream exponent estimation, consider a possible initialization, and apply it to the single speaker connected letters task of the AT&T bimodal database. We demonstrate the superior performance of the resulting multi-stream HMM to the audio-only, visual-only, and audio-visual single-stream HMMs
Keywords
audio-visual systems; feature extraction; hidden Markov models; maximum likelihood estimation; probability; speech recognition; synchronisation; AT&T bimodal database; HMM stream exponents; audio features; audio-only stream; audio-visual speech recognition; discriminative training; generalized probabilistic descent algorithm; hidden Markov model; initialization; log-likelihood; maximum likelihood; single speaker connected letters task; stream exponent estimation; synchronized features; two-stream HMM; visual features; visual-only stream; Automatic speech recognition; Hidden Markov models; Lips; Mutual information; Speech recognition; Streaming media; Testing; Topology; Visual databases; Vocabulary;
fLanguage
English
Publisher
ieee
Conference_Titel
Acoustics, Speech and Signal Processing, 1998. Proceedings of the 1998 IEEE International Conference on
Conference_Location
Seattle, WA
ISSN
1520-6149
Print_ISBN
0-7803-4428-6
Type
conf
DOI
10.1109/ICASSP.1998.679695
Filename
679695
Link To Document