• DocumentCode
    2787697
  • Title

    Realistic mouth animation based on an articulatory DBN model with constrained asynchrony

  • Author

    Jiang, Dongmei ; Ravyse, Ilse ; Liu, Peizhen ; Sahli, Hichem ; Verhelst, Werner

  • Author_Institution
    VUB-NPU Joint Res. Group on Audio Visual Signal Process. (AVSP), Northwestern Polytech. Univ., Xi´´an, China
  • fYear
    2010
  • fDate
    14-19 March 2010
  • Firstpage
    2478
  • Lastpage
    2481
  • Abstract
    In this paper, we propose an approach to convert acoustic speech to video realistic mouth animation based on an articulatory dynamic Bayesian network model with constrained asynchrony (AF_AVDBN). Conditional probability distributions are defined to control the asynchronies between the articulators such as lips, tongue and glottis/velum. An EM-based conversion algorithm is also presented to learn the optimal visual features given an auditory input and the trained AF_AVDBN parameters. In the training of the AF_AVDBN models, downsampled YUV spatial frequency features of the interpolated mouth image sequences are extracted as visual features. For reproducing the mouth animation sequence, from the learned visual features, a spatial upsampling and a temporal downsampling are applied. Both qualitative and quantitative results show that the proposed method is capable of producing more natural and realistic mouth animations, and the accuracy is further improved compared to the state of the art multi-stream Hidden Markov Model (MSHMM) and articulatory DBN model without asynchrony constraint (AF_DBN).
  • Keywords
    belief networks; computer animation; constraint handling; feature extraction; hearing; hidden Markov models; image sequences; interpolation; speech processing; statistical distributions; EM-based conversion algorithm; acoustic speech; articulatory DBN model; articulatory dynamic Bayesian network model; auditory input; constrained asynchrony; downsampled YUV spatial frequency; feature extraction; interpolated mouth image sequences; multistream Hidden Markov Model; optimal visual features; probability distributions; spatial upsampling; temporal downsampling; video realistic mouth animation; Animation; Bayesian methods; Frequency; Hidden Markov models; Image converters; Lips; Mouth; Probability distribution; Speech; Tongue; AF_AVDBN; AF_DBN; asynchrony; conditional probability distribution; mouth animation;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Acoustics Speech and Signal Processing (ICASSP), 2010 IEEE International Conference on
  • Conference_Location
    Dallas, TX
  • ISSN
    1520-6149
  • Print_ISBN
    978-1-4244-4295-9
  • Electronic_ISBN
    1520-6149
  • Type

    conf

  • DOI
    10.1109/ICASSP.2010.5494894
  • Filename
    5494894