Title :
Robot Audition from the Viewpoint of Computational Auditory Scene Analysis
Author :
Okuno, Hiroshi G. ; Ogata, Tetsuya ; Komatani, Kazunori
Author_Institution :
Kyoto Univ., Kyoto
Abstract :
We have been engaged in research on computational auditory scene analysis to attain sophisticated robot/computer human interaction by manipulating real-world sound signals. The objective of our research is the understanding of an arbitrary sound mixture including music and environmental sounds as well as voiced speech, obtained by robot´s ears (microphones) embedded on the robot. Three main issues in computational auditory scene analysis are sound source localization, separation, and recognition of separated sounds for a mixture of speech signals as well as polyphonic music signals. The Missing Feature Theory (MFT) approach integrates sound source separation and automatic speech recognition by generating missing feature masks. This robot audition system has been successfully ported to three kinds of robots, SIG2, Robovie R2 and Honda ASIMO. A robot recognizes three simultaneous speeches such as placing a meal order or a referee for Rock- Paper-Scissors Sound Games with a delay of less than 2 seconds. The real-time beat tracking system is also developed for robot audition. A robot hears music, understands and predicts its musical beats to behave in accordance with the beat times in real-time.
Keywords :
acoustic signal processing; feature extraction; man-machine systems; robots; source separation; speech recognition; Honda ASIMO; Robovie R2 robot; SIG2 robot; beat tracking system; computational auditory scene analysis; environmental sound; feature masks; microphones; missing feature theory; musical beats; polyphonic music signals; robot audition system; robot ears; robot hearing; robot-computer human interaction; sound mixture; sound signal manipulation; sound source localization; sound source separation; speech recognition; speech signals; voiced speech; Automatic speech recognition; Ear; Human robot interaction; Image analysis; Microphones; Multiple signal classification; Music; Robotics and automation; Speech analysis; Speech recognition; CASA; Computationa Auditory Scene Analysis; Robot Audition;
Conference_Titel :
Informatics Education and Research for Knowledge-Circulating Society, 2008. ICKS 2008. International Conference on
Conference_Location :
Kyoto
Print_ISBN :
978-0-7695-3128-1
DOI :
10.1109/ICKS.2008.10