DocumentCode
706299
Title
The use of a formant diagram in audiovisual speech activity detection
Author
van Bree, K.C. ; Belt, H.J.W.
Author_Institution
Video Process. Syst. Group, Philips Res., Eindhoven, Netherlands
fYear
2007
fDate
3-7 Sept. 2007
Firstpage
2390
Lastpage
2394
Abstract
We present an audiovisual approach to the problem of voice activity detection for systems with a single microphone and a single camera with multiple people in the camera´s field of view. We aim to have a speech activity detection result per person. The approach utilizes a face tracking and lip contour tracking algorithm for the video analysis, and pitch presence detection and formant frequency tracking algorithms for the audio analysis. When from the audio we detect speech activity and from the video we find lip activity for more than a single person, we check for each person whether the vowels correspond with the video mouth parameters to find out if this person speaks. To this end we make use of the F1-F2 speech formant diagram in which we propose three vowel groups that are distinctive both from audio and video data.
Keywords
audio signal processing; microphones; speech processing; video signal processing; audio analysis; audiovisual speech activity detection; camera field; formant diagram; frequency tracking algorithms; lip contour tracking algorithm; pitch presence detection; single camera; single microphone; video analysis; video data; voice activity detection; Detectors; Lips; Mouth; Shape; Speech; Speech processing;
fLanguage
English
Publisher
ieee
Conference_Titel
Signal Processing Conference, 2007 15th European
Conference_Location
Poznan
Print_ISBN
978-839-2134-04-6
Type
conf
Filename
7099236
Link To Document