Title :
Enhanced Robot Speech Recognition Based on Microphone Array Source Separation and Missing Feature Theory
Author :
Yamamoto, Shun´ichi ; Valin, Jean-Marc ; Nakadai, Kazuhiro ; Rouat, Jean ; Michaud, François ; Ogata, Tetsuya ; Okuno, Hiroshi G.
Author_Institution :
Graduate School of Informatics, Kyoto University, Kyoto, 606-8501 Japan shunichi@kuis.kyoto-u.ac.jp
Abstract :
A humanoid robot under real-world environments usually hears mixtures of sounds, and thus three capabilities are essential for robot audition; sound source localization, separation, and recognition of separated sounds. While the first two are frequently addressed, the last one has not been studied so much. We present a system that gives a humanoid robot the ability to localize, separate and recognize simultaneous sound sources. A microphone array is used along with a real-time dedicated implementation of Geometric Source Separation (GSS) and a multi-channel post-filter that gives us a further reduction of interferences from other sources. An automatic speech recognizer (ASR) based on the Missing Feature Theory (MFT) recognizes separated sounds in real-time by generating missing feature masks automatically from the post-filtering step. The main advantage of this approach for humanoid robots resides in the fact that the ASR with a clean acoustic model can adapt the distortion of separated sound by consulting the post-filter feature masks. Recognition rates are presented for three simultaneous speakers located at 2m from the robot. Use of both the post-filter and the missing feature mask results in an average reduction in error rate of 42% (relative).
Keywords :
Acoustic distortion; Automatic speech recognition; Error analysis; Humanoid robots; Interference; Loudspeakers; Microphone arrays; Source separation; Speech coding; Speech recognition;
Conference_Titel :
Robotics and Automation, 2005. ICRA 2005. Proceedings of the 2005 IEEE International Conference on
Print_ISBN :
0-7803-8914-X
DOI :
10.1109/ROBOT.2005.1570323