Title :
Adaptive bimodal sensor fusion for automatic speechreading
Author :
Meier, Uwe ; Hürst, Wolfgang ; Duchnowski, Paul
Author_Institution :
Interactive Syst. Labs., Karlsruhe Univ., Germany
Abstract :
We present work on improving the performance of automated speech recognizers by using additional visual information: (lip-/speechreading); achieving error reduction of up to 50%. This paper focuses on different methods of combining the visual and acoustic data to improve the recognition performance. We show this on an extension of an existing state-of-the-art speech recognition system, a modular MS-TDNN. We have developed adaptive combination methods at several levels of the recognition network. Additional information such as estimated signal-to-noise ratio (SNR) is used in some cases. The results of the different combination methods are shown for clean speech and data with artificial noise (white, music, motor). The new combination methods adapt automatically to varying noise conditions making hand-tuned parameters unnecessary
Keywords :
acoustic signal processing; adaptive signal processing; image processing; multilayer perceptrons; sensor fusion; speech recognition; SNR; acoustic data; adaptive bimodal sensor fusion; adaptive combination methods; artificial noise; automated speech recognizer performance; automatic speechreading; clean speech; error reduction; lip reading; modular MS-TDNN; motor; music; noise conditions; recognition network; signal-to-noise ratio; speech recognition system; visual data; visual information; white noise; Acoustic noise; Acoustic testing; Background noise; Interactive systems; Loudspeakers; Sensor fusion; Signal to noise ratio; Speech enhancement; Speech recognition; White noise;
Conference_Titel :
Acoustics, Speech, and Signal Processing, 1996. ICASSP-96. Conference Proceedings., 1996 IEEE International Conference on
Conference_Location :
Atlanta, GA
Print_ISBN :
0-7803-3192-3
DOI :
10.1109/ICASSP.1996.543250