Title :
Multi-stream product modal audio-visual integration strategy for robust adaptive speech recognition
Author :
Gurbuz, Sabri ; Tufekci, Zekeriya ; Patterson, Eric ; Gowdy, John N.
Author_Institution :
Department of Electrical and Computer Engineering, Clemson University, SC 29634, USA
Abstract :
In this paper, we extend an existing audio-only automatic speech recognizer to implement a multi-stream audio-visual automatic speech recognition (AV-ASR) system. Our method forms a multi-stream feature vector from audio-visual speech data, computes the statistical modal parameters probabilities on the basis of multi-stream audio-visual features, and performs dynamic programming jointly on the multi-stream product modal Hidden Markov Models (MS-PM-HMMs) by utilizing a noise type and signal-to-noise ratio (SNR) based stream-weighting value. Experimental results are presented for an isolated word recognition task for eight different noise types from the NOISEX data base for several SNR values. The proposed system reduces the word error rate (WER), averaged over several SNR and noise types, from 55.9% With the audio-only recognizer and 7.9% with the late-integration audio-visual recognizer to 2.6% WER in the validation set.
Keywords :
Gold; Hidden Markov models; Production facilities; Signal to noise ratio; Speech; Speech recognition;
Conference_Titel :
Acoustics, Speech, and Signal Processing (ICASSP), 2002 IEEE International Conference on
Conference_Location :
Orlando, FL, USA
Print_ISBN :
0-7803-7402-9
DOI :
10.1109/ICASSP.2002.5745029