Title :
Real-time semi-blind speech extraction with speaker direction tracking on Kinect
Author :
Onuma, Y. ; Kamado, N. ; Saruwatari, Hiroshi ; Shikano, Kiyohiro
Author_Institution :
Grad. Sch. of Inf. Sci., Nara Inst. of Sci. & Technol., Nara, Japan
Abstract :
In this paper, speech recognition accuracy improvement is addressed for ICA-based multichannel noise reduction in spoken-dialogue robot. First, to achieve high recognition accuracy for the early utterance of the target speaker, we introduce a new rapid ICA initialization method combining robot image information and a prestored initial separation filter bank. From this image information, an ICA initial filter fitted to the user´s direction can be used to save the user´s first utterance. Next, a new permutation solving method using a probability statistics model is proposed for realistic sound mixtures consisting of point-source speech and diffuse noise. We implement these methods using user tracking on Microsoft Kinect and evaluate it by speech recognition experiment in the real environment. The experimental results show that the proposed approaches can markedly improve the word recognition accuracy.
Keywords :
robot kinematics; speaker recognition; ICA-based multichannel noise reduction; Kinect; image information; initial separation filter bank; permutation solving method; probability statistics; real-time semi-blind speech extraction; speaker direction tracking; speech recognition; spoken-dialogue robot; target speaker; Accuracy; Arrays; Noise; Real-time systems; Robots; Speech; Speech recognition;
Conference_Titel :
Signal & Information Processing Association Annual Summit and Conference (APSIPA ASC), 2012 Asia-Pacific
Conference_Location :
Hollywood, CA
Print_ISBN :
978-1-4673-4863-8