• DocumentCode
    149774
  • Title

    Mapping sounds onto images using binaural spectrograms

  • Author

    Deleforge, Antoine ; Drouard, Vincent ; Girin, Laurent ; Horaud, Radu

  • Author_Institution
    INRIA Grenoble Rhone-Alpes, Grenoble, France
  • fYear
    2014
  • fDate
    1-5 Sept. 2014
  • Firstpage
    2470
  • Lastpage
    2474
  • Abstract
    We propose a novel method for mapping sound spectrograms onto images and thus enabling alignment between auditory and visual features for subsequent multimodal processing. We suggest a supervised learning approach to this audio-visual fusion problem, on the following grounds. Firstly, we use a Gaussian mixture of locally-linear regressions to learn a mapping from image locations to binaural spectrograms. Secondly, we derive a closed-form expression for the conditional posterior probability of an image location, given both an observed spectrogram, emitted from an unknown source direction, and the mapping parameters that were previously learnt. Prominently, the proposed method is able to deal with completely different spectrograms for training and for alignment. While fixed-length wide-spectrum sounds are used for learning, thus fully and robustly estimating the regression, variable-length sparse-spectrum sounds, e.g., speech, are used for alignment. The proposed method successfully extracts the image location of speech utterances in realistic reverberant-room scenarios.
  • Keywords
    Gaussian processes; image processing; learning (artificial intelligence); mixture models; probability; speech processing; Gaussian mixture; audio-visual fusion problem; auditory features; binaural spectrograms; closed-form expression; conditional posterior probability; fixed-length wide-spectrum sounds; image locations; locally-linear regressions; realistic reverberant-room scenarios; sound mapping; speech utterances; subsequent multimodal processing; supervised learning approach; variable-length sparse-spectrum sounds; visual features; Acoustics; Spectrogram; Speech; Speech processing; Training; Vectors; Visualization;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Signal Processing Conference (EUSIPCO), 2014 Proceedings of the 22nd European
  • Conference_Location
    Lisbon
  • Type

    conf

  • Filename
    6952934