• DocumentCode
    2523524
  • Title

    A panoramic video and acoustic beamforming sensor for videoconferencing

  • Author

    Fiala, Mark ; Green, David ; Roth, Gerhard

  • Author_Institution
    Comput. Video Group, Nat. Res. Council, Ottawa, Ont., Canada
  • fYear
    2004
  • fDate
    2-3 Oct. 2004
  • Firstpage
    47
  • Lastpage
    52
  • Abstract
    Videoconferencing systems in use today typically rely on either fixed or pan/tilt/zoom cameras for image acquisition, and close-talking microphones for good quality audio capture. These sensors are unsuitable for scenarios involving multiple users seated at a meeting table, or non-stationary users. In these situations, the focus of attention should change from one talker to the next, and if possible track moving users. This work describes a multi-modal perception system using both video and audio signals for such a videoconferencing system. An omnidirectional video camera and an audio beamforming array are combined into a device placed in the center of a meeting table. The video and audio is processed to determine the direction of who is talking, a virtual perspective view and directional audio beam is then created. Computer vision algorithms are used to find people by motion and by face and marker detection. The audio beamformer merges the signals from a circular array of microphones to provide audio power measurements in different directions simultaneously. The video and audio cues are combined to make a decision as to the location of the talker. The system has been integrated with OpenH.323 and serves as a node using Microsoft NetMeeting.
  • Keywords
    acoustic signal processing; array signal processing; computer vision; sensors; teleconferencing; video signal processing; Microsoft NetMeeting; OpenH.323; acoustic beamforming sensor; audio signals; close-talking microphones; computer vision algorithm; image acquisition; multimodal perception system; pan/tilt/zoom camera; panoramic video sensor; video signals; videoconferencing system; Acoustic beams; Acoustic sensors; Array signal processing; Cameras; Computer vision; Face detection; Focusing; Microphone arrays; Motion detection; Teleconferencing;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Haptic, Audio and Visual Environments and Their Applications, 2004. HAVE 2004. Proceedings. The 3rd IEEE International Workshop on
  • Print_ISBN
    0-7803-8817-8
  • Type

    conf

  • DOI
    10.1109/HAVE.2004.1391880
  • Filename
    1391880