Title :
Visual prosody: facial movements accompanying speech
Author :
Graf, Hans Peter ; Cosatto, Eric ; Strom, Volker ; Huang, Fu Jie
Author_Institution :
AT&T Labs Res., Middletown, NJ, USA
Abstract :
As we articulate speech, we usually move the head and exhibit various facial expressions. This visual aspect of speech aids understanding and helps communicating additional information, such as the speaker´s mood. We analyze quantitatively head and facial movements that accompany speech and investigate how they relate to the text´s prosodic structure. We recorded several hours of speech and measured the locations of the speakers´ main facial features as well as their head poses. The text was evaluated with a prosody prediction tool, identifying phrase boundaries and pitch accents. Characteristic for most speakers are simple motion patterns that are repeatedly applied in synchrony with the main prosodic events. Direction and strength of head movements vary widely from one speaker to another, yet their timing is typically well synchronized with the spoken text. Understanding quantitatively the correlations between head movements and spoken text is important for synthesizing photo-realistic talking heads. Talking heads appear much more engaging when they exhibit realistic motion patterns.
Keywords :
computer animation; face recognition; image motion analysis; realistic images; speech processing; facial expressions; facial movements; head movements; head poses; motion patterns; photorealistic talking heads; pitch accents; prosodic structure; prosody prediction tool; speech; spoken text; visual prosody; Contracts; Eyes; Facial animation; Humans; Mood; Muscles; Psychology; Read only memory; Speech synthesis; Timing;
Conference_Titel :
Automatic Face and Gesture Recognition, 2002. Proceedings. Fifth IEEE International Conference on
Conference_Location :
Washington, DC, USA
Print_ISBN :
0-7695-1602-5
DOI :
10.1109/AFGR.2002.1004186