Title :
The use of a formant diagram in audiovisual speech activity detection
Author :
van Bree, K.C. ; Belt, H.J.W.
Author_Institution :
Video Process. Syst. Group, Philips Res., Eindhoven, Netherlands
Abstract :
We present an audiovisual approach to the problem of voice activity detection for systems with a single microphone and a single camera with multiple people in the camera´s field of view. We aim to have a speech activity detection result per person. The approach utilizes a face tracking and lip contour tracking algorithm for the video analysis, and pitch presence detection and formant frequency tracking algorithms for the audio analysis. When from the audio we detect speech activity and from the video we find lip activity for more than a single person, we check for each person whether the vowels correspond with the video mouth parameters to find out if this person speaks. To this end we make use of the F1-F2 speech formant diagram in which we propose three vowel groups that are distinctive both from audio and video data.
Keywords :
audio signal processing; microphones; speech processing; video signal processing; audio analysis; audiovisual speech activity detection; camera field; formant diagram; frequency tracking algorithms; lip contour tracking algorithm; pitch presence detection; single camera; single microphone; video analysis; video data; voice activity detection; Detectors; Lips; Mouth; Shape; Speech; Speech processing;
Conference_Titel :
Signal Processing Conference, 2007 15th European
Conference_Location :
Poznan
Print_ISBN :
978-839-2134-04-6