Realistic mouth animation based on an articulatory DBN model with constrained asynchrony

Author

Jiang, Dongmei ; Ravyse, Ilse ; Liu, Peizhen ; Sahli, Hichem ; Verhelst, Werner

Author_Institution

VUB-NPU Joint Res. Group on Audio Visual Signal Process. (AVSP), Northwestern Polytech. Univ., Xi´´an, China

fYear

2010

fDate

14-19 March 2010

Firstpage

2478

Lastpage

2481

Abstract

In this paper, we propose an approach to convert acoustic speech to video realistic mouth animation based on an articulatory dynamic Bayesian network model with constrained asynchrony (AF_AVDBN). Conditional probability distributions are defined to control the asynchronies between the articulators such as lips, tongue and glottis/velum. An EM-based conversion algorithm is also presented to learn the optimal visual features given an auditory input and the trained AF_AVDBN parameters. In the training of the AF_AVDBN models, downsampled YUV spatial frequency features of the interpolated mouth image sequences are extracted as visual features. For reproducing the mouth animation sequence, from the learned visual features, a spatial upsampling and a temporal downsampling are applied. Both qualitative and quantitative results show that the proposed method is capable of producing more natural and realistic mouth animations, and the accuracy is further improved compared to the state of the art multi-stream Hidden Markov Model (MSHMM) and articulatory DBN model without asynchrony constraint (AF_DBN).

Keywords

belief networks; computer animation; constraint handling; feature extraction; hearing; hidden Markov models; image sequences; interpolation; speech processing; statistical distributions; EM-based conversion algorithm; acoustic speech; articulatory DBN model; articulatory dynamic Bayesian network model; auditory input; constrained asynchrony; downsampled YUV spatial frequency; feature extraction; interpolated mouth image sequences; multistream Hidden Markov Model; optimal visual features; probability distributions; spatial upsampling; temporal downsampling; video realistic mouth animation; Animation; Bayesian methods; Frequency; Hidden Markov models; Image converters; Lips; Mouth; Probability distribution; Speech; Tongue; AF_AVDBN; AF_DBN; asynchrony; conditional probability distribution; mouth animation;

fLanguage

English

Publisher

ieee

Conference_Titel

Acoustics Speech and Signal Processing (ICASSP), 2010 IEEE International Conference on

Conference_Location

Dallas, TX

ISSN

1520-6149

Print_ISBN

978-1-4244-4295-9

Electronic_ISBN

1520-6149

Type

conf

DOI

10.1109/ICASSP.2010.5494894

Filename

5494894