• DocumentCode
    3348327
  • Title

    Audio-visual graphical models for speech processing

  • Author

    Hershey, John ; Attias, Hagai ; Jojic, Nebojsa ; Kristjansson, Trausti

  • Author_Institution
    Machine Perception Lab., Univ. of California, San Diego, CA, USA
  • Volume
    5
  • fYear
    2004
  • fDate
    17-21 May 2004
  • Abstract
    Perceiving sounds in a noisy environment is a challenging problem. Visual lip-reading can provide relevant information but is also challenging because lips are moving and a tracker must deal with a variety of conditions. Typically audio-visual systems have been assembled from individually engineered modules. We propose to fuse audio and video in a probabilistic generative model that implements cross-model self-supervised learning, enabling adaptation to audio-visual data. The video model features a Gaussian mixture model embedded in a linear subspace of a sprite which translates in the video. The system can learn to detect and enhance speech in noise given only a short (30 second) sequence of audio-visual data. We show some results for speech detection and enhancement, and discuss extensions to the model that are under investigation.
  • Keywords
    Gaussian processes; adaptive signal processing; feature extraction; inference mechanisms; learning (artificial intelligence); speech processing; video signal processing; Gaussian mixture video model; audio-visual data adaptation; audio-visual graphical models; audio-visual speech phonetic content; cross-model self-supervised learning; feature extraction; inference rules; learning rules; probabilistic generative model; speech detection; speech enhancement; speech processing; sprite linear subspace; video tracking; visual lip-reading; Acoustic noise; Acoustical engineering; Assembly systems; Audio-visual systems; Fuses; Graphical models; Lips; Speech enhancement; Speech processing; Working environment noise;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Acoustics, Speech, and Signal Processing, 2004. Proceedings. (ICASSP '04). IEEE International Conference on
  • ISSN
    1520-6149
  • Print_ISBN
    0-7803-8484-9
  • Type

    conf

  • DOI
    10.1109/ICASSP.2004.1327194
  • Filename
    1327194