DocumentCode :
542692
Title :
Multi-stream product modal audio-visual integration strategy for robust adaptive speech recognition
Author :
Gurbuz, Sabri ; Tufekci, Zekeriya ; Patterson, Eric ; Gowdy, John N.
Author_Institution :
Department of Electrical and Computer Engineering, Clemson University, SC 29634, USA
Volume :
2
fYear :
2002
fDate :
13-17 May 2002
Abstract :
In this paper, we extend an existing audio-only automatic speech recognizer to implement a multi-stream audio-visual automatic speech recognition (AV-ASR) system. Our method forms a multi-stream feature vector from audio-visual speech data, computes the statistical modal parameters probabilities on the basis of multi-stream audio-visual features, and performs dynamic programming jointly on the multi-stream product modal Hidden Markov Models (MS-PM-HMMs) by utilizing a noise type and signal-to-noise ratio (SNR) based stream-weighting value. Experimental results are presented for an isolated word recognition task for eight different noise types from the NOISEX data base for several SNR values. The proposed system reduces the word error rate (WER), averaged over several SNR and noise types, from 55.9% With the audio-only recognizer and 7.9% with the late-integration audio-visual recognizer to 2.6% WER in the validation set.
Keywords :
Gold; Hidden Markov models; Production facilities; Signal to noise ratio; Speech; Speech recognition;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Acoustics, Speech, and Signal Processing (ICASSP), 2002 IEEE International Conference on
Conference_Location :
Orlando, FL, USA
ISSN :
1520-6149
Print_ISBN :
0-7803-7402-9
Type :
conf
DOI :
10.1109/ICASSP.2002.5745029
Filename :
5745029
Link To Document :
بازگشت