DocumentCode :
1904506
Title :
Bimodal sensor integration on the example of `speechreading´
Author :
Bregler, Christoph ; Manke, Stefan ; Hild, Hermann ; Waibel, Alex
Author_Institution :
Dept. of Comput. Sci., Karlsruhe Univ., Germany
fYear :
1993
fDate :
1993
Firstpage :
667
Abstract :
It is shown how recognition performance in automated speech preception can be significantly improved by additional lipreading, so called speech-reading. It is shown on an extension of an existing state-of-the-art speech recognition system, a modular multi-state time-delay neural network (MS-TDNN). The acoustic and visual speech data are preclassified in two separate front-end phoneme TDNNs and combined to acoustic-visual hypotheses for the dynamic time warping algorithm. This is shown on a connected word recognition problem, the letter spelling task. With speech-reading the error rate can be reduced up to half of the error rate of pure acoustic recognition
Keywords :
neural nets; speech recognition; acoustic-visual hypotheses; automated speech preception; bimodal sensor; lipreading; modular multistatic time delay neural net; speech recognition; speech-reading; Acoustic devices; Auditory system; Cameras; Computer architecture; Computer science; Error analysis; Humans; Microphones; Signal processing; Speech recognition;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Neural Networks, 1993., IEEE International Conference on
Conference_Location :
San Francisco, CA
Print_ISBN :
0-7803-0999-5
Type :
conf
DOI :
10.1109/ICNN.1993.298634
Filename :
298634
Link To Document :
بازگشت