Title :
Speech Acquisition in Meetings with an Audio-Visual Sensor Array
Author :
McCowan, Iain ; Krishna, Maganti Hari ; Gatica-Perez, Daniel ; Moore, Darren ; Ba, Sileye
Author_Institution :
IDIAP Res. Inst., Martigny
Abstract :
Close-talk headset microphones have been traditionally used for speech acquisition in a number of applications, as they naturally provide a higher signal-to-noise ratio -needed for recognition tasks-than single distant microphones. However, in multi-party conversational settings like meetings, microphone arrays represent an important alternative to close-talking microphones, as they allow for localisation and tracking of speakers and signal-independent enhancement, while providing a non-intrusive, hands-free operation mode. In this article, we investigate the use of an audio-visual sensor array, composed of a small table-top microphone array and a set of cameras, for speaker tracking and speech enhancement in meetings. Our methodology first fuses audio and video for person tracking, and then integrates the output of the tracker with a beamformer for speech enhancement. We compare and discuss the features of the resulting speech signal with respect to that obtained from single close-talking and table-top microphones
Keywords :
array signal processing; audio-visual systems; image recognition; microphone arrays; sensor fusion; speaker recognition; speech enhancement; audio-visual sensor array; beamformer; cameras; speaker tracking; speech acquisition; speech enhancement; table-top microphone array; Application software; Cameras; Fuses; Microphone arrays; Multimodal sensors; Pervasive computing; Sensor arrays; Signal to noise ratio; Speech enhancement; Speech recognition;
Conference_Titel :
Multimedia and Expo, 2005. ICME 2005. IEEE International Conference on
Conference_Location :
Amsterdam
Print_ISBN :
0-7803-9331-7
DOI :
10.1109/ICME.2005.1521688