Title :
Toward movement-invariant automatic lip-reading and speech recognition
Author :
Duchnowski, Paul ; Hunke, Martin ; Büsching, Dietrich ; Meier, Uwe ; Waibel, Alex
Author_Institution :
Interactive Syst. Lab., Karlsruhe Univ., Germany
Abstract :
We present the development of a modular system for flexible human-computer interaction via speech. The speech recognition component integrates acoustic and visual information (automatic lip-reading) improving overall recognition, especially in noisy environments. The image of the lips, constituting the visual input, is automatically extracted from the camera picture of the speaker´s face by the lip locator module. Finally, the speaker´s face is automatically acquired and followed by the face tracker sub-system. Integration of the three functions results in the first bi-modal speech recognizer allowing the speaker reasonable freedom of movement within a possibly noisy room while continuing to communicate with the computer via voice. Compared to audio-alone recognition, the combined system achieves a 20 to 50 percent error rate reduction for various signal/noise conditions
Keywords :
image processing; man-machine systems; modules; speech recognition; video cameras; visual communication; acoustic information; bi-modal speech recognizer; camcorder; camera picture; error rate reduction; face tracker sub-system; human-computer interaction; lip locator module; modular system; movement-invariant automatic lip-reading; noisy environments; signal/noise conditions; speaker face; speech recognition; visual information; voice communication; Acoustic noise; Automatic speech recognition; Cameras; Data mining; Error analysis; Interactive systems; Keyboards; Laboratories; Lips; Noise reduction; Speech recognition;
Conference_Titel :
Acoustics, Speech, and Signal Processing, 1995. ICASSP-95., 1995 International Conference on
Conference_Location :
Detroit, MI
Print_ISBN :
0-7803-2431-5
DOI :
10.1109/ICASSP.1995.479285