DocumentCode :
417299
Title :
DBN based multi-stream models for audio-visual speech recognition
Author :
Gowdy, John N. ; Subramanya, Amarnag ; Bartels, Chris ; Bilmes, Jeff
Author_Institution :
Clemson Univ., SC, USA
Volume :
1
fYear :
2004
fDate :
17-21 May 2004
Abstract :
In this paper, we propose a model based on dynamic Bayesian networks (DBN) to integrate information from multiple audio and visual streams. We also compare the DBN based system (implemented using the Graphical Model Toolkit (GMTK)) with a classical HMM (implemented in the Hidden Markov Model Toolkit (HTK)) for both the single and two stream integration problems. We also propose a new model (mixed integration) to integrate information from three or more streams derived from different modalities and compare the new model´s performance with that of a synchronous integration scheme. A new technique to estimate stream confidence measures for the integration of three or more streams is also developed and implemented. Results from our implementation using the Clemson University Audio Visual Experiments (CUAVE) database indicate an absolute improvement of about 4% in word accuracy in the -4 to 10db average case when making use of two audio and one video streams for the mixed integration models over the sychronous models.
Keywords :
Bayes methods; belief networks; speech recognition; CUAVE database; Clemson University Audio Visual Experiments database; DBN; GMTK; Graphical Model Toolkit; HMM; HTK; Hidden Markov Model Toolkit; audio-visual speech recognition; dynamic Bayesian networks; multi-stream models; performance; stream confidence measure estimation; Active noise reduction; Audio databases; Bayesian methods; Graphical models; Hidden Markov models; Random variables; Speech recognition; Streaming media; Visual databases; Working environment noise;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Acoustics, Speech, and Signal Processing, 2004. Proceedings. (ICASSP '04). IEEE International Conference on
ISSN :
1520-6149
Print_ISBN :
0-7803-8484-9
Type :
conf
DOI :
10.1109/ICASSP.2004.1326155
Filename :
1326155
Link To Document :
بازگشت