مرکز منطقه ای اطلاع رساني علوم و فناوري - Speaker detection using the timing structure of lip motion and sound

DocumentCode :

2121270

Title :

Speaker detection using the timing structure of lip motion and sound

Author :

Horii, Yu ; Kawashima, Hiroaki ; Matsuyama, Takashi

Author_Institution :

Grad. Sch. of Inf., Kyoto Univ., Kyoto

fYear :

2008

fDate :

23-28 June 2008

Firstpage :

Lastpage :

Abstract :

In this paper, we propose a novel approach to speaker detection by an integration of audio-visual information using the cue of timing structure. We first extract feature sequences of lip motion and sound, and segment each of them into temporal intervals. Then, we construct a cross-media timing-structure model of human speech by learning the temporal relations of overlapping intervals. Based on the learned model, we realize speaker detection by evaluating the timing structure of the observed video and audio. Our experimental result shows the effectiveness of using temporal relations of intervals for speaker detection.

Keywords :

gesture recognition; speaker recognition; audio-visual information; cross-media timing structure model; human speech; lip motion; speaker detection; Data mining; Face detection; Feature extraction; Hidden Markov models; Humans; Loudspeakers; Microphone arrays; Motion detection; Speech recognition; Timing;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Computer Vision and Pattern Recognition Workshops, 2008. CVPRW '08. IEEE Computer Society Conference on

Conference_Location :

Anchorage, AK

ISSN :

2160-7508

Print_ISBN :

978-1-4244-2339-2

Electronic_ISBN :

2160-7508

Type :

conf

DOI :

10.1109/CVPRW.2008.4563183

Filename :

4563183

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=2121270