Title :
Head Movement Synthesis Based on Semantic and Prosodic Features for a Chinese Expressive Avatar
Author :
Zhang, Shen ; Wu, Zhiyong ; Meng, Helen M. ; Cai, Lianhong
Author_Institution :
Dept. of Comput. Sci. & Technol., Tsinghua Univ., Beijing
Abstract :
This paper proposes an approach for text-to-visual speech synthesis, where the synthetic head movements are rendered with an expressive talking avatar speaking Cantonese Chinese. The input text consists of descriptive information sourced from the Hong Kong tourism domain. The text is segmented into prosodic words (PW) and we adopt the PAD model to describe the expressivity of a prosodic word based on its semantics. Within the PW, we consider two prosodic features relevant to head movement synthesis, namely, the stress and tone of the Chinese syllable. We designed and recorded an audiovisual speech corpus and analyzed the data to derive statistical correspondences between different (P,A) values for a Chinese prosodic word and head movement coordinates. These statistics help parameter selection in a sinusoidal movement model. Corpus analyses also enable us to locate "peak points" of head movements that are synchronized with prosodic features within a prosodic word. These help the design of three heuristics that control head movements within a prosodic word. Perceptual evaluation based on the expressive talking avatar shows that head movement synthesis can raise the MOS by 1.04 points on average, when compared to the baseline which only shows lip articulations without head movements.
Keywords :
avatars; gesture recognition; rendering (computer graphics); speech synthesis; statistical analysis; Cantonese Chinese; Chinese expressive avatar; Chinese prosodic word; Chinese syllable; Hong Kong tourism domain; PAD model; audiovisual speech corpus; descriptive information; head movement synthesis; lip articulations; peak points location; prosodic features; semantic; sinusoidal movement model; talking avatar; text-to-visual speech synthesis; Atherosclerosis; Audio recording; Avatars; Computer science; Data analysis; Magnetic heads; Speech analysis; Speech synthesis; Stress; Video recording; PAD emotional model; expressivity; talking head; text genres; visual prosody;
Conference_Titel :
Acoustics, Speech and Signal Processing, 2007. ICASSP 2007. IEEE International Conference on
Conference_Location :
Honolulu, HI
Print_ISBN :
1-4244-0727-3
Electronic_ISBN :
1520-6149
DOI :
10.1109/ICASSP.2007.367043