مرکز منطقه ای اطلاع رساني علوم و فناوري - Realistic Mouth-Synching for Speech-Driven Talking Face Using Articulatory Modelling

DocumentCode :

1231804

Title :

Realistic Mouth-Synching for Speech-Driven Talking Face Using Articulatory Modelling

Author :

Xie, Lei ; Liu, Zhi-Qiang

Author_Institution :

Sch. of Creative Media, City Univ. of Hong Kong

Volume :

Issue :

fYear :

2007

fDate :

4/1/2007 12:00:00 AM

Firstpage :

500

Lastpage :

510

Abstract :

This paper presents an articulatory modelling approach to convert acoustic speech into realistic mouth animation. We directly model the movements of articulators, such as lips, tongue, and teeth, using a dynamic Bayesian network (DBN)-based audio-visual articulatory model (AVAM). A multiple-stream structure with a shared articulator layer is adopted in the model to synchronously associate the two building blocks of speech, i.e., audio and video. This model not only describes the synchronization between visual articulatory movements and audio speech, but also reflects the linguistic fact that different articulators evolve asynchronously. We also present a Baum-Welch DBN inversion (DBNI) algorithm to generate optimal facial parameters from audio given the trained AVAM under maximum likelihood (ML) criterion. Extensive objective and subjective evaluations on the JEWEL audio-visual dataset demonstrate that compared with phonemic HMM approaches, facial parameters estimated by our approach follow the true parameters more accurately, and the synthesized facial animation sequences are so lively that 38% of them are undistinguishable

Keywords :

belief networks; computer animation; learning (artificial intelligence); maximum likelihood estimation; speech-based user interfaces; Baum-Welch DBN inversion algorithm; acoustic speech; articulatory modelling; audio-visual articulatory model; dynamic Bayesian network; facial animation; maximum likelihood criterion; mouth-synching; phonemic HMM approach; realistic mouth animation; Articulatory model; Baum–Welch DBN inversion (DBNI); dynamic Bayesian networks (DBNs); facial animation; mouth-synching; talking face;

fLanguage :

English

Journal_Title :

Multimedia, IEEE Transactions on

Publisher :

ieee

ISSN :

1520-9210

Type :

jour

DOI :

10.1109/TMM.2006.888009

Filename :

4130381

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=1231804