Title :
Speech-driven facial animation using a hierarchical model
Author :
Cosker, D.P. ; Marshall, A.D. ; Rosin, P.L. ; Hicks, Y.A.
Author_Institution :
Sch. of Comput. Sci., Cardiff Univ., UK
Abstract :
A system capable of producing near video-realistic animation of a speaker given only speech inputs is presented. The audio input is a continuous speech signal, requires no phonetic labelling and is speaker-independent. The system requires only a short video training corpus of a subject speaking a list of viseme-targeted words in order to achieve convincing realistic facial synthesis. The system learns the natural mouth and face dynamics of a speaker to allow new facial poses, unseen in the training video, to be synthesised. To achieve this the authors have developed a novel approach which utilises a hierarchical and nonlinear principal components analysis (PCA) model which couples speech and appearance. Animation of different facial areas, defined by the hierarchy, is performed separately and merged in post-processing using an algorithm which combines texture and shape PCA data. It is shown that the model is capable of synthesising videos of a speaker using new audio segments from both previously heard and unheard speakers.
Keywords :
audio signal processing; computer animation; principal component analysis; speech processing; video signal processing; cluster modelling; hierarchical facial model; principal component analysis model; speech signal; speech-driven facial animation;
Journal_Title :
Vision, Image and Signal Processing, IEE Proceedings -
DOI :
10.1049/ip-vis:20040752