Title :
Independent information from visual features for multimodal speech recognition
Author :
Gurbuz, Sabri ; Tufekci, Zekeriya ; Patterson, Eric ; Gowdy, John N.
Author_Institution :
Dept. of Electr. & Comput. Eng., Clemson Univ., SC, USA
Abstract :
The performance of audio-based speech recognition systems degrades severely when there is a mismatch between training and usage environments due to background noise. This degradation is due to a loss of ability to extract and distinguish important information from audio features. One of the emerging techniques for dealing with this problem is the addition of visual features in a multimodal recognition system. This paper presents an affine-invariant, multimodal speech recognition system and focuses on the additional information that is available from video features. Results are presented that demonstrate the distinct information available from a visual subsystem that will allow optimal joint-decisions based on the SNR-ratio and type of noise to exceed either audio or video subsystem in nearly all noisy environments
Keywords :
acoustic noise; feature extraction; image recognition; speech recognition; video signal processing; SNR-ratio; affine-invariant multimodal speech recognition system; audio features; audio subsystem; audio-based speech recognition systems; background noise; multimodal recognition system; multimodal speech recognition; optimal joint-decisions; video features; visual features; visual subsystem; Acoustic noise; Automatic speech recognition; Background noise; Degradation; Feature extraction; Humans; Speech enhancement; Speech recognition; System performance; Working environment noise;
Conference_Titel :
SoutheastCon 2001. Proceedings. IEEE
Conference_Location :
Clemson, SC
Print_ISBN :
0-7803-6748-0
DOI :
10.1109/SECON.2001.923119