Expressive Facial Animation Synthesis by Learning Speech Coarticulation and Expression Spaces

Author

Deng, Zhigang ; Neumann, Ulrich ; Lewis, J.P. ; Kim, Tae-Yong ; Bulut, Murtaza ; Narayanan, Shrikanth

Author_Institution

Dept. of Comput. Sci., Houston Univ., TX

Volume

12

Issue

6

fYear

2006

Firstpage

1523

Lastpage

1534

Abstract

Synthesizing expressive facial animation is a very challenging topic within the graphics community. In this paper, we present an expressive facial animation synthesis system enabled by automated learning from facial motion capture data. Accurate 3D motions of the markers on the face of a human subject are captured while he/she recites a predesigned corpus, with specific spoken and visual expressions. We present a novel motion capture mining technique that "learns" speech coarticulation models for diphones and triphones from the recorded data. A phoneme-independent expression eigenspace (PIEES) that encloses the dynamic expression signals is constructed by motion signal processing (phoneme-based time-warping and subtraction) and principal component analysis (PCA) reduction. New expressive facial animations are synthesized as follows: First, the learned coarticulation models are concatenated to synthesize neutral visual speech according to novel speech input, then a texture-synthesis-based approach is used to generate a novel dynamic expression signal from the PIEES model, and finally the synthesized expression signal is blended with the synthesized neutral visual speech to create the final expressive facial animation. Our experiments demonstrate that the system can effectively synthesize realistic expressive facial animation

Keywords

computer animation; data mining; emotion recognition; face recognition; learning (artificial intelligence); motion estimation; speech processing; speech synthesis; PCA reduction; computer graphics; diphones; expression spaces; expressive facial animation synthesis system; motion capture mining technique; motion signal processing; phoneme-independent expression eigenspace; principal component analysis; speech coarticulation learning; texture-synthesis-based approach; triphones; Concatenated codes; Face; Facial animation; Graphics; Humans; Motion analysis; Principal component analysis; Signal processing; Signal synthesis; Speech synthesis; Facial animation; animation synthesis; data-driven.; expressive speech; motion capture; speech coarticulation; texture synthesis; Artificial Intelligence; Face; Facial Expression; Humans; Image Interpretation, Computer-Assisted; Imaging, Three-Dimensional; Models, Biological; Speech; Speech Production Measurement;

fLanguage

English

Journal_Title

Visualization and Computer Graphics, IEEE Transactions on

Publisher

ieee

ISSN

1077-2626

Type

jour

DOI

10.1109/TVCG.2006.90

Filename

1703372