• DocumentCode
    2010142
  • Title

    Multiple active speaker localization based on audio-visual fusion in two stages

  • Author

    Li, Zhao ; Herfet, Thorsten ; Grochulla, Martin ; Thormählen, Thorsten

  • Author_Institution
    Telecommun. Lab., Saarland Univ., Saarbrucken, Germany
  • fYear
    2012
  • fDate
    13-15 Sept. 2012
  • Firstpage
    1
  • Lastpage
    7
  • Abstract
    Localization of multiple active speakers in natural environments with only two microphones is a challenging problem. Reverberation degrades performance of speaker localization based exclusively on directional cues. The audio modality alone has problems with localization accuracy while the video modality alone has problems with false speaker activity detections. This paper presents an approach based on audiovisual fusion in two stages. In the first stage, speaker activity is detected based on the audio-visual fusion which can handle false lip movements. In the second stage, a Gaussian fusion method is proposed to integrate the estimates of both modalities. As a consequence, the localization accuracy and robustness compared to the audio/video modality alone is significantly increased. Experimental results in various scenarios confirmed the improved performance of the proposed system.
  • Keywords
    Gaussian processes; audio signal processing; microphones; sensor fusion; speaker recognition; video signal processing; Gaussian fusion method; active speaker localization; audio modality; audio-visual fusion; audiovisual fusion; directional cues; false lip movement; false speaker activity detection; microphone; modality estimation; natural environment; reverberation; video modality; Azimuth; Cameras; Face; Mouth; Robot vision systems; Robustness; Visualization;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Multisensor Fusion and Integration for Intelligent Systems (MFI), 2012 IEEE Conference on
  • Conference_Location
    Hamburg
  • Print_ISBN
    978-1-4673-2510-3
  • Electronic_ISBN
    978-1-4673-2511-0
  • Type

    conf

  • DOI
    10.1109/MFI.2012.6343015
  • Filename
    6343015