• DocumentCode
    426255
  • Title

    Robust speech interface based on audio and video information fusion for humanoid HRP-2

  • Author

    Hara, Isao ; Asano, Futoshi ; Asoh, Hideki ; Ogata, Jun ; Ichimura, Naoyuki ; Kawai, Yoshihiro ; Kanehiro, Fumio ; Hirukawa, Hirohisa ; Yamamoto, Kiyoshi

  • Author_Institution
    Inf. Tech. Res. Inst., AIST, Tsukuba, Japan
  • Volume
    3
  • fYear
    2004
  • fDate
    28 Sept.-2 Oct. 2004
  • Firstpage
    2404
  • Abstract
    For cooperative work of robots and humans in the real world, a communicative function based on speech is indispensable for robots. To realize such a function in a noisy real environment, it is essential that robots be able to extract target speech spoken by humans from a mixture of sounds by their own resources. We have developed a method of detecting and extracting speech events based on the fusion of audio and video information. In this method, audio information (sound localization using a microphone array) and video information (human tracking using a camera) are fused by a Bayesian network to enable the detection of speech events. The information of detected speech events is then utilized in sound separation using adaptive beam forming. In this paper, some basic investigations for applying the above system to the humanoid robot HRP-2 are reported. Input devices, namely a microphone array and a camera, were mounted on the head of HRP-2, and acoustic characteristics for sound localization/separation performance were investigated. Also, the human tracking system was improved so that it can be used in a dynamic situation. Finally, overall performance of the system was tested via off-line experiments.
  • Keywords
    array signal processing; belief networks; humanoid robots; intelligent robots; man-machine systems; sensor fusion; speech processing; video signal processing; Bayesian network; adaptive beam forming; audio-video information fusion; camera; human tracking system; humanoid robot; microphone array; speech events detection; Acoustic noise; Cameras; Collaborative work; Data mining; Event detection; Humans; Microphone arrays; Robot vision systems; Robustness; Speech;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Intelligent Robots and Systems, 2004. (IROS 2004). Proceedings. 2004 IEEE/RSJ International Conference on
  • Print_ISBN
    0-7803-8463-6
  • Type

    conf

  • DOI
    10.1109/IROS.2004.1389768
  • Filename
    1389768