• DocumentCode
    2053665
  • Title

    An integrated framework for multi-channel multi-source localization and voice activity detection

  • Author

    Taghizadeh, Mohammad J. ; Garner, Philip N. ; Bourlard, Hervé ; Abutalebi, Hamid R. ; Asaei, Afsaneh

  • Author_Institution
    Idiap Res. Inst., Martigny, Switzerland
  • fYear
    2011
  • fDate
    May 30 2011-June 1 2011
  • Firstpage
    92
  • Lastpage
    97
  • Abstract
    Two of the major challenges in microphone array based adaptive beamforming, speech enhancement and distant speech recognition, are robust and accurate source localization and voice activity detection. This paper introduces a spatial gradient steered response power using the phase transform (SRP-PHAT) method which is capable of localization of competing speakers in overlapping conditions. We further investigate the behavior of the SRP function and characterize theoretically a fixed point in its search space for the diffuse noise field. We call this fixed point the null position in the SRP search space. Building on this evidence, we propose a technique for multichannel voice activity detection (MVAD) based on detection of a maximum power corresponding to the null position. The gradient SRP-PHAT in tandem with the MVAD form an integrated framework of multi-source localization and voice activity detection. The experiments carried out on real data recordings show that this framework is very effective in practical applications of hands-free communication.
  • Keywords
    array signal processing; gradient methods; microphone arrays; speech recognition; transforms; MVAD; adaptive beamforming; data recording; distant speech recognition; gradient SRP-PHAT method; hands-free communication; microphone array; multichannel multisource localization; multichannel voice activity detection; phase transform; spatial gradient steered response power; speech enhancement; voice activity detection; Azimuth; Estimation; Microphone arrays; Noise; Power generation; Speech; Diffuse noise field; Multi-channel voice activity detection; Multi-source localization; Steered Response Power (SRP) localization;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Hands-free Speech Communication and Microphone Arrays (HSCMA), 2011 Joint Workshop on
  • Conference_Location
    Edinburgh
  • Print_ISBN
    978-1-4577-0997-5
  • Type

    conf

  • DOI
    10.1109/HSCMA.2011.5942417
  • Filename
    5942417