DocumentCode :
352468
Title :
From speech to talking faces: lip movements estimation based on linear approximators
Author :
Vignoli, F.
Author_Institution :
Genoa Univ.
Volume :
6
fYear :
2000
fDate :
2000
Firstpage :
2381
Abstract :
In human communication, speech understanding is greatly improved by the bimodal acoustic-visual effect, with respect to simple speech. This is particularly clear when the communication takes place in noisy environments or for non-native speakers. In this paper, we propose a novel algorithm based on linear approximators that estimates the lip movements from a timed sequence of phonemes. This sequence can be generated from real speech, by a segmentation technique based on a hidden Markov model (HMM), or from a text-to-speech system. The algorithm consists of two modules: the training module and the synthesis module. The training module is based on a eigen-analysis of an audiovisual database recorded for this purpose. The synthesis module takes as input the sequence of phonemes and implements an implicit coarticulation model. A later post-processing step converts the parameters estimated into a sequence of facial animation parameters that are compliant to the new MPEG-4 standard. The algorithm has been tested with FAE (Facial Animation Engine), which is an MPEG-4 compliant system developed at the author´s university
Keywords :
approximation theory; audio-visual systems; code standards; computer animation; eigenvalues and eigenfunctions; face recognition; hidden Markov models; learning systems; motion estimation; parameter estimation; sequences; speech intelligibility; speech synthesis; subroutines; FAE; Facial Animation Engine; MPEG-4 compliant system; audiovisual database; bimodal acoustic-visual effect; eigen-analysis; facial animation parameter sequence; hidden Markov model; human communication; implicit coarticulation model; linear approximators; lip movement estimation; noisy environments; nonnative speakers; parameter estimation; post-processing; speech segmentation technique; speech synthesis module; speech understanding; talking faces; text-to-speech system; timed phoneme sequence; training module; Audio databases; Facial animation; Hidden Markov models; Humans; Linear approximation; Loudspeakers; MPEG 4 Standard; Parameter estimation; Speech synthesis; Working environment noise;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Acoustics, Speech, and Signal Processing, 2000. ICASSP '00. Proceedings. 2000 IEEE International Conference on
Conference_Location :
Istanbul
ISSN :
1520-6149
Print_ISBN :
0-7803-6293-4
Type :
conf
DOI :
10.1109/ICASSP.2000.859320
Filename :
859320
Link To Document :
بازگشت