DocumentCode :
3328962
Title :
Two-layered audio-visual speech recognition for robots in noisy environments
Author :
Yoshida, Takami ; Nakadai, Kazuhiro ; Okuno, Hiroshi G.
Author_Institution :
Grad. Sch. of Inf. Sci. & Eng., Tokyo Inst. of Technol., Tokyo, Japan
fYear :
2010
fDate :
18-22 Oct. 2010
Firstpage :
988
Lastpage :
993
Abstract :
Audio-visual (AV) integration is one of the key ideas to improve perception in noisy real-world environments. This paper describes automatic speech recognition (ASR) to improve human-robot interaction based on AV integration. We developed AV-integrated ASR, which has two AV integration layers, that is, voice activity detection (VAD) and ASR. However, the system has three difficulties: 1) VAD and ASR have been separately studied although these processes are mutually dependent, 2) VAD and ASR assumed that high resolution images are available although this assumption never holds in the real world, and 3) an optimal weight between audio and visual stream was fixed while their reliabilities change according to environmental changes. To solve these problems, we propose a new VAD algorithm taking ASR characteristics into account, and a linear-regression-based optimal weight estimation method. We evaluate the algorithm for auditory-and/or visually-contaminated data. Preliminary results show that the robustness of VAD improved even when the resolution of the images is low, and the AVSR using estimated stream weight shows the effectiveness of AV integration.
Keywords :
audio-visual systems; hearing; human-robot interaction; image resolution; mobile robots; regression analysis; robust control; speech recognition; visual perception; automatic speech recognition; high resolution images; human-robot interaction; linear-regression-based optimal weight estimation; noisy environments; perception; robots; robustness; two-layered audio-visual speech recognition; visually-contaminated data; voice activity detection;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Intelligent Robots and Systems (IROS), 2010 IEEE/RSJ International Conference on
Conference_Location :
Taipei
ISSN :
2153-0858
Print_ISBN :
978-1-4244-6674-0
Type :
conf
DOI :
10.1109/IROS.2010.5651205
Filename :
5651205
Link To Document :
بازگشت