• DocumentCode
    1748660
  • Title

    Sequential Monte Carlo fusion of sound and vision for speaker tracking

  • Author

    Vermaak, J. ; Gangnet, M. ; Blake, A. ; Pérez, P.

  • Author_Institution
    Microsoft Res. Cambridge, UK
  • Volume
    1
  • fYear
    2001
  • fDate
    2001
  • Firstpage
    741
  • Abstract
    Video telephony could be considerably enhanced by provision of a tracking system that allows freedom of movement to the speaker while maintaining a well-framed image, for transmission over limited bandwidth. Already commercial multi-microphone systems exist which track speaker direction in order to reject background noise. Stereo sound and vision are complementary modalities in that sound is good for initialisation (where vision is expensive) whereas vision is good for localisation (where sound is less precise). Using generative probabilistic models and particle filtering, we show that stereo sound and vision can indeed be fused effectively, to make a system more capable than with either modality on its own
  • Keywords
    Monte Carlo methods; computer vision; stereo image processing; videotelephony; complementary modalities; generative probabilistic models; multi-microphone systems; particle filtering; sequential Monte Carlo fusion; speaker tracking; stereo sound; stereo vision; video telephony; well-framed image; Cameras; Delay effects; Fuses; Loudspeakers; Microphones; Monte Carlo methods; Reverberation; Signal processing; Telephony; Tracking;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Computer Vision, 2001. ICCV 2001. Proceedings. Eighth IEEE International Conference on
  • Conference_Location
    Vancouver, BC
  • Print_ISBN
    0-7695-1143-0
  • Type

    conf

  • DOI
    10.1109/ICCV.2001.937600
  • Filename
    937600