Title :
Informative subspaces for audio-visual processing: High-level function from low-level fusion
Author :
Fisher, John W., III ; Darrell, Trevor
Author_Institution :
Massachusetts Institute of Technology, Cambridge, 02139, USA
Abstract :
We propose a new probabilistic model of single source multi-modal generation, and show algorithms for maximizing mutual information which find correspondences between signal components. We show a nonparametric method for finding informative subspaces that captures complex statistical relationships between different modalities. We extend a previous subspace method to include new priors on the projection weights, yielding more robust results. Applied to human speakers, our model finds a relationship between audio speech and video of facial motion, and partially segments background events in both channels. We present new results on the problem of audio-visual verification, and show how the audio and video of a speaker can be matched without a prior model of the speaker´s voice or appearance.
Keywords :
Artificial neural networks; Gold; Pixel;
Conference_Titel :
Acoustics, Speech, and Signal Processing (ICASSP), 2002 IEEE International Conference on
Conference_Location :
Orlando, FL, USA
Print_ISBN :
0-7803-7402-9
DOI :
10.1109/ICASSP.2002.5745560