DocumentCode :
2974735
Title :
Improved decision trees for multi-stream HMM-based audio-visual continuous speech recognition
Author :
Huang, Jing ; Visweswariah, Karthik
Author_Institution :
IBM T.J. Watson Res. Center, Yorktown Heights, NY, USA
fYear :
2009
fDate :
Nov. 13 2009-Dec. 17 2009
Firstpage :
228
Lastpage :
231
Abstract :
HMM-based audio-visual speech recognition (AVSR) systems have shown success in continuous speech recognition by combining visual and audio information, especially in noisy environments. In this paper we study how to improve decision trees used to create context classes in HMM-based AVSR systems. Traditionally, visual models have been trained with the same context classes as the audio only models. In this paper we investigate the use of separate decision trees to model the context classes for the audio and visual streams independently. Additionally we investigate the use of viseme classes in the decision tree building for the visual stream. On experiments with a 37-speaker 1.5 hours test set (about 12000 words) of continuous digits in noise, we obtain about a 3% absolute (20% relative) gain on AVSR performance by using separate decision trees for the audio and visual streams when using viseme classes in decision tree building for the visual stream.
Keywords :
audio-visual systems; decision trees; hidden Markov models; speech recognition; AVSR performance; HMM-based AVSR systems; decision tree building; decision trees; multistream HMM-based audio-visual continuous speech recognition; noisy environments; viseme classes; visual and audio information; visual stream; Acoustic noise; Automatic speech recognition; Context modeling; Decision trees; Decoding; Hidden Markov models; Performance gain; Signal to noise ratio; Speech recognition; Streaming media;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Automatic Speech Recognition & Understanding, 2009. ASRU 2009. IEEE Workshop on
Conference_Location :
Merano
Print_ISBN :
978-1-4244-5478-5
Electronic_ISBN :
978-1-4244-5479-2
Type :
conf
DOI :
10.1109/ASRU.2009.5373454
Filename :
5373454
Link To Document :
بازگشت