Title :
Microphone Array Processing for Distant Speech Recognition: From Close-Talking Microphones to Far-Field Sensors
Author :
Kumatani, Kenichi ; McDonough, John ; Raj, Bhiksha
Author_Institution :
Disney Res., Pittsburgh, PA, USA
Abstract :
Distant speech recognition (DSR) holds the promise of the most natural human computer interface because it enables man-machine interactions through speech, without the necessity of donning intrusive body- or head-mounted microphones. Recognizing distant speech robustly, however, remains a challenge. This contribution provides a tutorial overview of DSR systems based on microphone arrays. In particular, we present recent work on acoustic beam forming for DSR, along with experimental results verifying the effectiveness of the various algorithms described here; beginning from a word error rate (WER) of 14.3% with a single microphone of a linear array, our state-of-the-art DSR system achieved a WER of 5.3%, which was comparable to that of 4.2% obtained with a lapel microphone. Moreover, we present an emerging technology in the area of far-field audio and speech processing based on spherical microphone arrays. Performance comparisons of spherical and linear arrays reveal that a spherical array with a diameter of 8.4 cm can provide recognition accuracy comparable or better than that obtained with a large linear array with an aperture length of 126 cm.
Keywords :
acoustic signal processing; array signal processing; human computer interaction; microphone arrays; speech recognition; DSR systems; WER; acoustic beamforming; body-mounted microphones; close-talking microphones; distant speech recognition; far-field audio processing; far-field sensors; head-mounted microphones; human computer interface; lapel microphone; linear array; man-machine interactions; microphone array processing; size 126 cm; size 8.4 cm; speech processing; spherical microphone arrays; word error rate; Array signal processing; Automatic speech recognition; Microphones; Speech recognition; Tutorials;
Journal_Title :
Signal Processing Magazine, IEEE
DOI :
10.1109/MSP.2012.2205285