DocumentCode :
2221953
Title :
Multimodal speaker localization in a probabilistic framework
Author :
Gurban, Mihai ; Thiran, Jean-Philippe
Author_Institution :
Signal Process. Inst., Ecole Polytech. Fed. de Lausanne, Lausanne, Switzerland
fYear :
2006
fDate :
4-8 Sept. 2006
Firstpage :
1
Lastpage :
5
Abstract :
A multimodal probabilistic framework is proposed for the problem of finding the active speaker in a video sequence. We localize the current speaker´s mouth in the image by using the video and the audio channels together. We propose a novel visual feature that is well-suited for the analysis of the movement of the mouth. After estimating the joint probability density of the audio and visual features, we can find the most probable location of the current speaker´s mouth in a sequence of images. The proposed method is tested on the CUAVE audio-visual database, yielding improved results, compared to other approaches from the literature.
Keywords :
audio-visual systems; feature extraction; image sequences; probability; speaker recognition; video signal processing; CUAVE audio-visual database; active speaker; audio channel; audio feature; image sequence; joint probability density estimation; mouth movement analysis; multimodal probabilistic framework; multimodal speaker localization; video channel; video sequence; visual feature; Abstracts; Speech; Tracking; Visualization;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Signal Processing Conference, 2006 14th European
Conference_Location :
Florence
ISSN :
2219-5491
Type :
conf
Filename :
7071490
Link To Document :
بازگشت