DocumentCode :
2882050
Title :
Informative subspaces for audio-visual processing: High-level function from low-level fusion
Author :
Fisher, John W., III ; Darrell, Trevor
Author_Institution :
Massachusetts Institute of Technology, Cambridge, 02139, USA
Volume :
4
fYear :
2002
fDate :
13-17 May 2002
Abstract :
We propose a new probabilistic model of single source multi-modal generation, and show algorithms for maximizing mutual information which find correspondences between signal components. We show a nonparametric method for finding informative subspaces that captures complex statistical relationships between different modalities. We extend a previous subspace method to include new priors on the projection weights, yielding more robust results. Applied to human speakers, our model finds a relationship between audio speech and video of facial motion, and partially segments background events in both channels. We present new results on the problem of audio-visual verification, and show how the audio and video of a speaker can be matched without a prior model of the speaker´s voice or appearance.
Keywords :
Artificial neural networks; Gold; Pixel;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Acoustics, Speech, and Signal Processing (ICASSP), 2002 IEEE International Conference on
Conference_Location :
Orlando, FL, USA
ISSN :
1520-6149
Print_ISBN :
0-7803-7402-9
Type :
conf
DOI :
10.1109/ICASSP.2002.5745560
Filename :
5745560
Link To Document :
بازگشت