• DocumentCode
    706299
  • Title

    The use of a formant diagram in audiovisual speech activity detection

  • Author

    van Bree, K.C. ; Belt, H.J.W.

  • Author_Institution
    Video Process. Syst. Group, Philips Res., Eindhoven, Netherlands
  • fYear
    2007
  • fDate
    3-7 Sept. 2007
  • Firstpage
    2390
  • Lastpage
    2394
  • Abstract
    We present an audiovisual approach to the problem of voice activity detection for systems with a single microphone and a single camera with multiple people in the camera´s field of view. We aim to have a speech activity detection result per person. The approach utilizes a face tracking and lip contour tracking algorithm for the video analysis, and pitch presence detection and formant frequency tracking algorithms for the audio analysis. When from the audio we detect speech activity and from the video we find lip activity for more than a single person, we check for each person whether the vowels correspond with the video mouth parameters to find out if this person speaks. To this end we make use of the F1-F2 speech formant diagram in which we propose three vowel groups that are distinctive both from audio and video data.
  • Keywords
    audio signal processing; microphones; speech processing; video signal processing; audio analysis; audiovisual speech activity detection; camera field; formant diagram; frequency tracking algorithms; lip contour tracking algorithm; pitch presence detection; single camera; single microphone; video analysis; video data; voice activity detection; Detectors; Lips; Mouth; Shape; Speech; Speech processing;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Signal Processing Conference, 2007 15th European
  • Conference_Location
    Poznan
  • Print_ISBN
    978-839-2134-04-6
  • Type

    conf

  • Filename
    7099236