DocumentCode :
1304888
Title :
Audio-Visual Group Recognition Using Diffusion Maps
Author :
Keller, Yosi ; Coifman, Ronald R. ; Lafon, Stéphane ; Zucker, Steven W.
Author_Institution :
Sch. of Eng., Bar Ilan Univ., Israel
Volume :
58
Issue :
1
fYear :
2010
Firstpage :
403
Lastpage :
413
Abstract :
Data fusion is a natural and common approach to recovering the state of physical systems. But the dissimilar appearance of different sensors remains a fundamental obstacle. We propose a unified embedding scheme for multisensory data, based on the spectral diffusion framework, which addresses this issue. Our scheme is purely data-driven and assumes no a priori statistical or deterministic models of the data sources. To extract the underlying structure, we first embed separately each input channel; the resultant structures are then combined in diffusion coordinates. In particular, as different sensors sample similar phenomena with different sampling densities, we apply the density invariant Laplace-Beltrami embedding. This is a fundamental issue in multisensor acquisition and processing, overlooked in prior approaches. We extend previous work on group recognition and suggest a novel approach to the selection of diffusion coordinates. To verify our approach, we demonstrate performance improvements in audio/visual speech recognition.
Keywords :
sensor fusion; speech recognition; audio-visual group recognition; data fusion; density invariant Laplace-Beltrami embedding; diffusion maps; multisensory data; spectral diffusion framework; speech recognition; Dimensionality reduction; Laplacian eigenmaps; multisensor; sensor fusion; speech recognition;
fLanguage :
English
Journal_Title :
Signal Processing, IEEE Transactions on
Publisher :
ieee
ISSN :
1053-587X
Type :
jour
DOI :
10.1109/TSP.2009.2030861
Filename :
5210209
Link To Document :
بازگشت