Title :
Learning cross-modal appearance models with application to tracking
Author :
Fisher, John W., III ; Darrell, Trevor
Author_Institution :
Artificial Intelligence Lab., Massachusetts Inst. of Technol., Cambridge, MA, USA
Abstract :
Objects of interest are rarely silent or invisible. Analysis of multi-modal signal generation from a single object represents a rich and challenging area for smart sensor arrays. We consider the problem of simultaneously learning and audio and visual appearance model of a moving subject. We present a method which successfully learns such a model without benefit of hand initialization using only the associated audio signal to "decide" which object to model and track. We are interested in particular in modeling joint audio and video variation, such as produced by a speaking face. We present an algorithm and experimental results of a human speaker moving in a scene.
Keywords :
image motion analysis; intelligent sensors; speaker recognition; audio appearance model; crossmodal appearance models; multi-modal signal generation; smart sensor arrays; speaking face; tracking; visual appearance model; Artificial intelligence; Intelligent sensors; Laboratories; Layout; Learning; Principal component analysis; Robustness; Sensor arrays; Signal analysis; Signal generators;
Conference_Titel :
Multimedia and Expo, 2003. ICME '03. Proceedings. 2003 International Conference on
Print_ISBN :
0-7803-7965-9
DOI :
10.1109/ICME.2003.1221541