Title :
Speaker Detection and Applications to Cross-Modal Analysis of Planning Meetings
Author :
Fang, Bing ; Xiong, Yingen ; Quek, Francis
Author_Institution :
Virginia Polytech. Inst. & State Univ., Blacksburg, VA, USA
Abstract :
Detection of meeting events is one of the most important tasks in multimodal analysis of planning meetings. Speaker detection is a key step for extraction of most meaningful meeting events. In this paper, we present an approach of speaker localization using combination of visual and audio information in multimodal meeting analysis. When talking, people make a speech accompanying mouth movements and hand gestures. By computing correlation of audio signals, mouth movements, and hand motion, we detect a talking person both spatially and temporally. Three kinds of features are extracted for speaker localization. Hand movements are expressed by hand motion efforts; audio features are expressed by computing 12 mel-frequency cepstral coefficients from audio signals, and mouth movements are expressed by normalized cross-correlation coefficients of mouth area between two successive frames. A time delay neural network is trained to learn the correlation relationships, which is then applied to perform speaker localization. Experiments and applications in planning meeting environments are provided.
Keywords :
audio signal processing; gesture recognition; motion compensation; neural nets; planning; speaker recognition; audio signals; cross-modal analysis; hand motion; mouth movements; planning meetings; speaker detection; speaker localization; time delay neural network; Cepstral analysis; Data mining; Delay effects; Event detection; Feature extraction; Information analysis; Meeting planning; Motion detection; Mouth; Speech; audio signal analysis; hand motion; meeting analysis; meeting event detection; mouth movement; mutimodal meeting analysis; planning meeting; speaker localization;
Conference_Titel :
Multimedia, 2009. ISM '09. 11th IEEE International Symposium on
Conference_Location :
San Diego, CA
Print_ISBN :
978-1-4244-5231-6
Electronic_ISBN :
978-0-7695-3890-7
DOI :
10.1109/ISM.2009.66