DocumentCode :
2449725
Title :
Look who´s talking: speaker detection using video and audio correlation
Author :
Cutler, Ross ; Davis, Larry
Author_Institution :
Inst. for Adv. Comput. Studies, Maryland Univ., College Park, MD, USA
Volume :
3
fYear :
2000
fDate :
2000
Firstpage :
1589
Abstract :
The visual motion of the mouth and the corresponding audio data generated when a person speaks are highly correlated. This fact has been exploited for lip/speech-reading and for improving speech recognition. We describe a method of automatically detecting a talking person (both spatially and temporally) using video and audio data from a single microphone. The audio-visual correlation is learned using a time-delayed neural network, which is then used to perform a spatio-temporal search for a speaking person. Applications include videoconferencing, video indexing and improving human-computer interaction (HCI). An example HCI application is provided
Keywords :
audio-visual systems; computer vision; correlation methods; delays; gesture recognition; learning (artificial intelligence); neural nets; speaker recognition; speech-based user interfaces; audio correlation; audio-visual correlation learning; human-computer interaction; lip-reading; microphone; mouth visual motion; spatio-temporal search; speaker detection; speech recognition; speech-reading; talking person detection; time delayed neural network; video correlation; video indexing; videoconferencing; Animation; Application software; Face detection; Human computer interaction; Indexing; Microphones; Mouth; Neural networks; Speech recognition; Videoconference;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Multimedia and Expo, 2000. ICME 2000. 2000 IEEE International Conference on
Conference_Location :
New York, NY
Print_ISBN :
0-7803-6536-4
Type :
conf
DOI :
10.1109/ICME.2000.871073
Filename :
871073
Link To Document :
بازگشت