مرکز منطقه ای اطلاع رساني علوم و فناوري - Improved decision trees for multi-stream HMM-based audio-visual continuous speech recognition

DocumentCode :

2974735

Title :

Improved decision trees for multi-stream HMM-based audio-visual continuous speech recognition

Author :

Huang, Jing ; Visweswariah, Karthik

Author_Institution :

IBM T.J. Watson Res. Center, Yorktown Heights, NY, USA

fYear :

2009

fDate :

Nov. 13 2009-Dec. 17 2009

Firstpage :

228

Lastpage :

231

Abstract :

HMM-based audio-visual speech recognition (AVSR) systems have shown success in continuous speech recognition by combining visual and audio information, especially in noisy environments. In this paper we study how to improve decision trees used to create context classes in HMM-based AVSR systems. Traditionally, visual models have been trained with the same context classes as the audio only models. In this paper we investigate the use of separate decision trees to model the context classes for the audio and visual streams independently. Additionally we investigate the use of viseme classes in the decision tree building for the visual stream. On experiments with a 37-speaker 1.5 hours test set (about 12000 words) of continuous digits in noise, we obtain about a 3% absolute (20% relative) gain on AVSR performance by using separate decision trees for the audio and visual streams when using viseme classes in decision tree building for the visual stream.

Keywords :

audio-visual systems; decision trees; hidden Markov models; speech recognition; AVSR performance; HMM-based AVSR systems; decision tree building; decision trees; multistream HMM-based audio-visual continuous speech recognition; noisy environments; viseme classes; visual and audio information; visual stream; Acoustic noise; Automatic speech recognition; Context modeling; Decision trees; Decoding; Hidden Markov models; Performance gain; Signal to noise ratio; Speech recognition; Streaming media;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Automatic Speech Recognition & Understanding, 2009. ASRU 2009. IEEE Workshop on

Conference_Location :

Merano

Print_ISBN :

978-1-4244-5478-5

Electronic_ISBN :

978-1-4244-5479-2

Type :

conf

DOI :

10.1109/ASRU.2009.5373454

Filename :

5373454

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=2974735