Title :
Microphone array processing for distant speech recognition: Towards real-world deployment
Author :
Kumatani, Kenichi ; Arakawa, Takeshi ; Yamamoto, Koji ; McDonough, John ; Raj, Bhiksha ; Singh, Rajdeep ; Tashev, I.
Author_Institution :
Disney Res., Pittsburgh, PA, USA
Abstract :
Distant speech recognition (DSR) holds out the promise of providing a natural human computer interface in that it enables verbal interactions with computers without the necessity of donning intrusive body- or head-mounted devices. Recognizing distant speech robustly, however, remains a challenge. This paper provides a overview of DSR systems based on microphone arrays. In particular, we present recent work on acoustic beamforming for DSR, along with experimental results verifying the effectiveness of the various algorithms described here; beginning from a word error rate (WER) of 14.3% with a single microphone of a 64-channel linear array, our state-of-the-art DSR system achieved a WER of 5.3%, which was comparable to that of 4.2% obtained with a lapel microphone. Furthermore, we report the results of speech recognition experiments on data captured with a popular device, the Kinect [1]. Even for speakers at a distance of four meters from the Kinect, our DSR system achieved acceptable recognition performance on a large vocabulary task, a WER of 24.1%, beginning from a WER of 42.5% with a single array channel.
Keywords :
error statistics; human computer interaction; microphone arrays; speech recognition; 64-channel linear array; DSR systems; Kinect; WER; acoustic beamforming; distant speech recognition; microphone array processing; natural human computer interface; real-world deployment; single array channel; verbal interactions; word error rate; Array signal processing; Arrays; Microphones; Noise; Sensors; Speech recognition; Vectors;
Conference_Titel :
Signal & Information Processing Association Annual Summit and Conference (APSIPA ASC), 2012 Asia-Pacific
Conference_Location :
Hollywood, CA
Print_ISBN :
978-1-4673-4863-8