DocumentCode :
1881041
Title :
Product HMMs for audio-visual continuous speech recognition using facial animation parameters
Author :
Aleksic, Petar S. ; Katsaggelos, Aggelos K.
Author_Institution :
Dept. of Electr. & Comput. Eng., Northwestern Univ., Evanston, IL, USA
Volume :
2
fYear :
2003
fDate :
6-9 July 2003
Abstract :
The use of visual information in addition to acoustic can improve automatic speech recognition. In this paper we compare different approaches for audio-visual information integration and show how they affect automatic speech recognition performance. We utilize facial animation parameters (FAPs), supported by the MPEG-4 standard for the visual representation as visual features. We use both single-stream and multi-stream hidden Markov models (HMM) to integrate audio and visual information. We performed both state and phone synchronous multi-stream integration. Product HMM topology is used to model the phone-synchronous integration. ASR experiments were performed under noisy audio conditions using a relatively large vocabulary (approximately 1000 words) audio-visual database. The proposed phone-synchronous system, which performed the best, reduces the word error rate (WER) by approximately 20% relatively to audio-only ASR (A-ASR) WERs, at various SNRs with additive white Gaussian noise.
Keywords :
AWGN; audio-visual systems; computer animation; hidden Markov models; speech processing; speech recognition; MPEG-4 standard; additive white Gaussian noise; audio-visual continuous speech recognition; audio-visual database; audio-visual information integration; automatic speech recognition; facial animation parameters; multistream hidden Markov models; phone-synchronous integration; product HMM topology; single-stream hidden Markov models; state synchronous multistream integration; word error rate; Audio databases; Automatic speech recognition; Facial animation; Financial advantage program; Hidden Markov models; MPEG 4 Standard; Speech recognition; Topology; Visual databases; Vocabulary;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Multimedia and Expo, 2003. ICME '03. Proceedings. 2003 International Conference on
Print_ISBN :
0-7803-7965-9
Type :
conf
DOI :
10.1109/ICME.2003.1221658
Filename :
1221658
Link To Document :
بازگشت