DocumentCode :
1878073
Title :
Learning cross-modal appearance models with application to tracking
Author :
Fisher, John W., III ; Darrell, Trevor
Author_Institution :
Artificial Intelligence Lab., Massachusetts Inst. of Technol., Cambridge, MA, USA
Volume :
2
fYear :
2003
fDate :
6-9 July 2003
Abstract :
Objects of interest are rarely silent or invisible. Analysis of multi-modal signal generation from a single object represents a rich and challenging area for smart sensor arrays. We consider the problem of simultaneously learning and audio and visual appearance model of a moving subject. We present a method which successfully learns such a model without benefit of hand initialization using only the associated audio signal to "decide" which object to model and track. We are interested in particular in modeling joint audio and video variation, such as produced by a speaking face. We present an algorithm and experimental results of a human speaker moving in a scene.
Keywords :
image motion analysis; intelligent sensors; speaker recognition; audio appearance model; crossmodal appearance models; multi-modal signal generation; smart sensor arrays; speaking face; tracking; visual appearance model; Artificial intelligence; Intelligent sensors; Laboratories; Layout; Learning; Principal component analysis; Robustness; Sensor arrays; Signal analysis; Signal generators;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Multimedia and Expo, 2003. ICME '03. Proceedings. 2003 International Conference on
Print_ISBN :
0-7803-7965-9
Type :
conf
DOI :
10.1109/ICME.2003.1221541
Filename :
1221541
Link To Document :
بازگشت