• DocumentCode
    1060725
  • Title

    Accelerated speech source localization via a hierarchical search of steered response power

  • Author

    Zotkin, Dmitry N. ; Duraiswami, Ramani

  • Author_Institution
    Perceptual Interfaces & Reality Lab., Univ. of Maryland, College Park, MD, USA
  • Volume
    12
  • Issue
    5
  • fYear
    2004
  • Firstpage
    499
  • Lastpage
    508
  • Abstract
    Accurate and fast localization of multiple speech sound sources is a problem that is of significant interest in applications such as conferencing systems. Recently, approaches that are based on search for local peaks of the steered response power are becoming popular, despite their known computational expense. Based on the observation that the wavelengths of the sound from a speech source are comparable to the dimensions of the space being searched and that the source is broadband, we have developed an efficient search algorithm. Significant speedups are achieved by using coarse-to-fine strategies in both space and frequency. We present applications of the search algorithm to speed up simple delay-and-sum beamforming and steered response power phase-transform weighted (SRP-PHAT) source localization algorithms. A systematic series of comparisons with previous algorithms are made that show that the technique is much faster, robust, and accurate. The performance of the algorithm can be further improved by using constraints from computer vision.
  • Keywords
    array signal processing; direction-of-arrival estimation; search problems; speech processing; accelerated speech source localization; conferencing system; delay-and-sum beamforming; hierarchical search algorithm; multiple speech sound source; steered response power; steered response power phase-transform weighted source localization algorithm; Acceleration; Array signal processing; Delay; Frequency; Inverse problems; Multimedia communication; Robustness; Sensor arrays; Signal processing algorithms; Speech; Array signal processing; multimedia applications; multimedia communication; position measurement; speech enhancement; transducer arrays; user interfaces;
  • fLanguage
    English
  • Journal_Title
    Speech and Audio Processing, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    1063-6676
  • Type

    jour

  • DOI
    10.1109/TSA.2004.832990
  • Filename
    1323086