DocumentCode :
745828
Title :
Cross-Modal Localization via Sparsity
Author :
Kidron, Einat ; Schechner, Yoav Y. ; Elad, Michael
Author_Institution :
Dept. of Electr. Eng., Technion-Israel Inst. Technol., Haifa
Volume :
55
Issue :
4
fYear :
2007
fDate :
4/1/2007 12:00:00 AM
Firstpage :
1390
Lastpage :
1404
Abstract :
Cross-modal analysis is a natural progression beyond processing of single-source signals. Simultaneous processing of two sources can reveal information that is unavailable when handling the sources separately. Indeed, human and animal perception, computer vision, weather forecasting, and various other scientific and technological fields can benefit from such a paradigm. A particular cross-modal problem is localization: out of the entire data array originating from one source, localize the components that best correlate with the other. For example, auditory and visual data sampled from a scene can be used to localize visual events associated with the sound track. In this paper we present a rigorous analysis of fundamental problems associated with the localization task. We then develop an approach that leads efficiently to a unique, high definition localization outcome. Our method is based on canonical correlation analysis (CCA), where inherent ill-posedness is removed by exploiting sparsity of cross-modal events. We apply our approach to localization of audio-visual events. The proposed algorithm grasps such dynamic audio-visual events with high spatial resolution. The algorithm effectively detects the pixels that are associated with sound, while filtering out other dynamic pixels, overcoming substantial visual distractions and audio noise. The algorithm is simple and efficient thanks to its reliance on linear programming, while being free of user-defined parameters
Keywords :
audio signal processing; audio-visual systems; image resolution; linear programming; audio noise; canonical correlation analysis; cross-modal localization; data array; dynamic audio-visual events; dynamic pixels; high spatial resolution; linear programming; single-source signals processing; substantial visual distractions; Animals; Computer vision; Filtering algorithms; Heuristic algorithms; Humans; Layout; Signal analysis; Signal processing; Spatial resolution; Weather forecasting; Computer vision; cross-sensor fusion; multimedia; multimodal analysis; multisensor fusion; overfitting; regularization; stochastic analysis;
fLanguage :
English
Journal_Title :
Signal Processing, IEEE Transactions on
Publisher :
ieee
ISSN :
1053-587X
Type :
jour
DOI :
10.1109/TSP.2006.888095
Filename :
4133038
Link To Document :
بازگشت