Title :
Trainable videorealistic speech animation
Author :
Ezzat, Tony ; Geiger, Gadi ; Poggio, Tomaso
Author_Institution :
Center for Biol. & Comput. Learning, Massachusetts Inst. of Technol., Cambridge, MA, USA
Abstract :
We describe how to create with machine learning techniques a generative, videorealistic, and speech animation module. A human subject is first recorded using a videocamera as he/she utters a pre-determined speech corpus. After processing the corpus automatically, a visual speech module is learned from the data that is capable of synthesizing the human subject´s mouth uttering entirely novel utterances that were not recorded in the original video. The synthesized utterance is re-composited onto a background sequence, which contains natural head and eye movement. The final output is videorealistic in the sense that it looks like a video camera recording of the subject. At run time, the input to the system can be either real audio sequences or synthetic audio produced by a text-to-speech system, as long as they have been phonetically aligned.
Keywords :
computer animation; face recognition; image sequences; learning (artificial intelligence); speech synthesis; video cameras; machine learning techniques; synthesized utterance; text-to-speech system; video camera; videorealistic speech animation; visual speech module; Animation; Audio recording; Cameras; Humans; Machine learning; Magnetic heads; Mouth; Speech processing; Speech synthesis; Video recording;
Conference_Titel :
Automatic Face and Gesture Recognition, 2004. Proceedings. Sixth IEEE International Conference on
Print_ISBN :
0-7695-2122-3
DOI :
10.1109/AFGR.2004.1301509