Accurate visible speech synthesis based on concatenating variable length motion capture data

Author

Ma, Jiyong ; Cole, Ron ; Pellom, Bryan ; Ward, Wayne ; Wise, Barbara

Author_Institution

Center for Spoken Language Res., Colorado Univ., Boulder, CO, USA

Volume

12

Issue

2

fYear

2006

Firstpage

266

Lastpage

276

Abstract

We present a novel approach to synthesizing accurate visible speech based on searching and concatenating optimal variable-length units in a large corpus of motion capture data. Based on a set of visual prototypes selected on a source face and a corresponding set designated for a target face, we propose a machine learning technique to automatically map the facial motions observed on the source face to the target face. In order to model the long distance coarticulation effects in visible speech, a large-scale corpus that covers the most common syllables in English was collected, annotated and analyzed. For any input text, a search algorithm to locate the optimal sequences of concatenated units for synthesis is described. A new algorithm to adapt lip motions from a generic 3D face model to a specific 3D face model is also proposed. A complete, end-to-end visible speech animation system is implemented based on the approach. This system is currently used in more than 60 kindergartens through third grade classrooms to teach students to read using a lifelike conversational animated agent. To evaluate the quality of the visible speech produced by the animation system, both subjective evaluation and objective evaluation are conducted. The evaluation results show that the proposed approach is accurate and powerful for visible speech synthesis.

Keywords

computer animation; face recognition; image motion analysis; learning (artificial intelligence); search problems; solid modelling; speech synthesis; 3D face model; coarticulation effect; facial motion; lip motion; machine learning technique; motion capture data; optimal sequence; search algorithm; speech animation system; visible speech synthesis; visual prototype; Concatenated codes; Facial animation; Humans; Large-scale systems; Lips; Machine learning; Prototypes; Speech analysis; Speech processing; Speech synthesis; Face animation; character animation; coarticulation effect; virtual human.; visible speech; visual speech; Algorithms; Artificial Intelligence; Computer Graphics; Computer Simulation; Face; Image Enhancement; Image Interpretation, Computer-Assisted; Imaging, Three-Dimensional; Information Storage and Retrieval; Models, Biological; Movement; Pattern Recognition, Automated; Reproducibility of Results; Sensitivity and Specificity; Speech; Speech Production Measurement; User-Computer Interface; Video Recording;

fLanguage

English

Journal_Title

Visualization and Computer Graphics, IEEE Transactions on

Publisher

ieee

ISSN

1077-2626

Type

jour

DOI

10.1109/TVCG.2006.18

Filename

1580460