DocumentCode :
542706
Title :
Identification of speakers in movie dialogs using audiovisual cues
Author :
Li, Ying ; Narayanan, Shrikanth ; Kuo, C. -C Jay
Author_Institution :
Integrated Media Systems Center and Department of Electrical Engineering, University of Southern California, Los Angeles, 90089-2564, USA
Volume :
2
fYear :
2002
fDate :
13-17 May 2002
Abstract :
The problem of identifying speakers from a movie dialog scene is addressed in this paper. While most previous work on speaker identification has been carried out using pure audio data, more robust results could be obtained by integrating the knowledge from multiple media sources such as visual and audio information when they are available. In this work, we first identify and isolate speech segments from background by applying video shot detection, audio classification and adaptive silence detection techniques, then a decision is made based on the calculated likelihood between the incoming speech data and pre-trained speaker/background models. Moreover, to verify the effectiveness of the adaptive silence detector, we have compared it with a statistically trained silence model. Experimental results show that the proposed algorithm can achieve approximately 84% identification accuracy by integrating multiple media cues.
Keywords :
Adaptation model; Image segmentation; Motion pictures; Speech; Speech processing; Speech recognition; TV;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Acoustics, Speech, and Signal Processing (ICASSP), 2002 IEEE International Conference on
Conference_Location :
Orlando, FL, USA
ISSN :
1520-6149
Print_ISBN :
0-7803-7402-9
Type :
conf
DOI :
10.1109/ICASSP.2002.5745047
Filename :
5745047
Link To Document :
بازگشت