DocumentCode :
2338626
Title :
Coarse speech recognition by audio-visual integration based on missing feature theory
Author :
Koiwa, Tomoaki ; Nakadai, Kazuhiro ; Imura, Jun-ichi
Author_Institution :
Tokyo Inst. of Technol., Tokyo
fYear :
2007
fDate :
Oct. 29 2007-Nov. 2 2007
Firstpage :
1751
Lastpage :
1756
Abstract :
Audio-visual speech recognition (AVSR) is a promising approach to improve noise robustness of speech recognition in the real world. A phoneme and a viseme are used as an auditory and visual unit for AVSR, respectively. However, in the real world, they are often misclassified due to additional input noises. To solve this problem, we propose two approaches. One is audio-visual integration based on missing feature theory to cope with missing or unreliable audio and visual features for recognition. The other is a biologically-inspired approach, that is, phoneme and viseme grouping based on coarse-to-fine recognition. Preliminary experiments show that audio-visual speech recognition based on these approaches improves the noise robustness of AVSR drastically.
Keywords :
audio-visual systems; noise; speech recognition; audio-visual integration; coarse speech recognition; missing feature theory; noise robustness; phoneme grouping; viseme grouping; Acoustic noise; Automatic speech recognition; Human robot interaction; Humanoid robots; Microphones; Noise robustness; Robotics and automation; Signal to noise ratio; Speech recognition; Working environment noise;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Intelligent Robots and Systems, 2007. IROS 2007. IEEE/RSJ International Conference on
Conference_Location :
San Diego, CA
Print_ISBN :
978-1-4244-0912-9
Electronic_ISBN :
978-1-4244-0912-9
Type :
conf
DOI :
10.1109/IROS.2007.4399300
Filename :
4399300
Link To Document :
بازگشت