مرکز منطقه ای اطلاع رساني علوم و فناوري - Active audio-visual integration for Voice Activity Detection based on a Causal Bayesian Network

DocumentCode :

2008093

Title :

Active audio-visual integration for Voice Activity Detection based on a Causal Bayesian Network

Author :

Yoshida, Takafumi ; Nakadai, Kazuhiro

Author_Institution :

Grad. Sch. of Inf. Sci. & Eng., Tokyo Inst. of Technol., Tokyo, Japan

fYear :

2012

fDate :

Nov. 29 2012-Dec. 1 2012

Firstpage :

370

Lastpage :

375

Abstract :

This paper addresses an active audio-visual integration framework which integrates audio and visual information with a robot´s active motion for noise-robust Voice Activity Detection (VAD). VAD is crucial for noise robust Automatic Speech Recognition (ASR) because speech captured by a robot´s microphones is usually contaminated with other noise sources. To realize such noise-robust VAD, we propose Active Audio-Visual (AAV) integration framework which integrates auditory, visual and motion information using a Causal Bayesian Network (CBN). CBN is a subclass of Bayesian networks, which is able to estimate the effect on VAD performance caused by active motions. Since CBN is a general framework for information integration, we can naturally introduce various types of information such as the location of a speaker and a noise source which affect VAD performance to CBN, and CBN selects the optimal active motion for better perception of the robot using “intervention” mechanism in CBN. We implemented a prototype system based on the proposed framework on a humanoid robot called Hearbo. The proposed AAV-VAD is compared with three types of AV-VAD; simple AAV-VAD, multi-regression-based AAV-VAD, and stationary (not active) AV-VAD. A preliminary experiment using the prototype system showed that the VAD performance of the proposed AV-VAD was 14.4, 26.0, and 30.3 points higher than that of the simple active, multi-regression-based active, and stationary AV-VAD, respectively.

Keywords :

belief networks; human-robot interaction; humanoid robots; robot vision; speech recognition; AAV integration framework; ASR; CBN; Hearbo humanoid robot; VAD; active audio-visual integration; active audio-visual integration framework; auditory information; automatic speech recognition; causal Bayesian network; motion information; multiregression-based active AV-VAD; noise-robust VAD; noise-robust voice activity detection; simple active AV-VAD; stationary AV-VAD; visual information; voice activity detection; Microphones; Robots; Robustness; Speech;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Humanoid Robots (Humanoids), 2012 12th IEEE-RAS International Conference on

Conference_Location :

Osaka

ISSN :

2164-0572

Type :

conf

DOI :

10.1109/HUMANOIDS.2012.6651546

Filename :

6651546

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=2008093