DocumentCode :
2053665
Title :
An integrated framework for multi-channel multi-source localization and voice activity detection
Author :
Taghizadeh, Mohammad J. ; Garner, Philip N. ; Bourlard, Hervé ; Abutalebi, Hamid R. ; Asaei, Afsaneh
Author_Institution :
Idiap Res. Inst., Martigny, Switzerland
fYear :
2011
fDate :
May 30 2011-June 1 2011
Firstpage :
92
Lastpage :
97
Abstract :
Two of the major challenges in microphone array based adaptive beamforming, speech enhancement and distant speech recognition, are robust and accurate source localization and voice activity detection. This paper introduces a spatial gradient steered response power using the phase transform (SRP-PHAT) method which is capable of localization of competing speakers in overlapping conditions. We further investigate the behavior of the SRP function and characterize theoretically a fixed point in its search space for the diffuse noise field. We call this fixed point the null position in the SRP search space. Building on this evidence, we propose a technique for multichannel voice activity detection (MVAD) based on detection of a maximum power corresponding to the null position. The gradient SRP-PHAT in tandem with the MVAD form an integrated framework of multi-source localization and voice activity detection. The experiments carried out on real data recordings show that this framework is very effective in practical applications of hands-free communication.
Keywords :
array signal processing; gradient methods; microphone arrays; speech recognition; transforms; MVAD; adaptive beamforming; data recording; distant speech recognition; gradient SRP-PHAT method; hands-free communication; microphone array; multichannel multisource localization; multichannel voice activity detection; phase transform; spatial gradient steered response power; speech enhancement; voice activity detection; Azimuth; Estimation; Microphone arrays; Noise; Power generation; Speech; Diffuse noise field; Multi-channel voice activity detection; Multi-source localization; Steered Response Power (SRP) localization;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Hands-free Speech Communication and Microphone Arrays (HSCMA), 2011 Joint Workshop on
Conference_Location :
Edinburgh
Print_ISBN :
978-1-4577-0997-5
Type :
conf
DOI :
10.1109/HSCMA.2011.5942417
Filename :
5942417
Link To Document :
بازگشت