Title :
Audio-Visual Group Recognition Using Diffusion Maps
Author :
Keller, Yosi ; Coifman, Ronald R. ; Lafon, Stéphane ; Zucker, Steven W.
Author_Institution :
Sch. of Eng., Bar Ilan Univ., Israel
Abstract :
Data fusion is a natural and common approach to recovering the state of physical systems. But the dissimilar appearance of different sensors remains a fundamental obstacle. We propose a unified embedding scheme for multisensory data, based on the spectral diffusion framework, which addresses this issue. Our scheme is purely data-driven and assumes no a priori statistical or deterministic models of the data sources. To extract the underlying structure, we first embed separately each input channel; the resultant structures are then combined in diffusion coordinates. In particular, as different sensors sample similar phenomena with different sampling densities, we apply the density invariant Laplace-Beltrami embedding. This is a fundamental issue in multisensor acquisition and processing, overlooked in prior approaches. We extend previous work on group recognition and suggest a novel approach to the selection of diffusion coordinates. To verify our approach, we demonstrate performance improvements in audio/visual speech recognition.
Keywords :
sensor fusion; speech recognition; audio-visual group recognition; data fusion; density invariant Laplace-Beltrami embedding; diffusion maps; multisensory data; spectral diffusion framework; speech recognition; Dimensionality reduction; Laplacian eigenmaps; multisensor; sensor fusion; speech recognition;
Journal_Title :
Signal Processing, IEEE Transactions on
DOI :
10.1109/TSP.2009.2030861