• DocumentCode
    1799388
  • Title

    Realtime speech-driven facial animation using Gaussian Mixture Models

  • Author

    Changwei Luo ; Yu Jun ; Xian Li ; Zengfu Wang

  • Author_Institution
    Dept. of Autom., Univ. of Sci. & Technol. of China, Hefei, China
  • fYear
    2014
  • fDate
    14-18 July 2014
  • Firstpage
    1
  • Lastpage
    6
  • Abstract
    Synthesizing speech-driven facial animation is the process of animating a virtual face according to the input audio signal. Actually, audio-to-visual conversion is the core of speech-driven facial animation. In this paper, Gaussian Mixture Models (GMM) are employed for audio-to-visual conversion. The conventional GMM based method performs the conversion frame by frame using minimum mean square error estimation. We consider two issues related to the conventional method: the influence of previous visual features on current visual feature is not considered, and GMM training and conversion are inconsistent. To address these issues, we propose incorporating previous visual features into the conversion. We also propose a minimum conversion error based approach to refine the GMM parameters. Experiments on a public available database show that our method can accurately convert audio features into visual features. The conversion accuracy is comparable to a current state-of-the-art trajectory-based approach. Based on the proposed method, we develop a speech-driven facial animation system, the system runs in real time and outputs realistic speech animations.
  • Keywords
    Gaussian processes; computer animation; estimation theory; learning (artificial intelligence); mean square error methods; mixture models; speech synthesis; GMM; Gaussian mixture model; audio-to-visual conversion; minimum mean square error estimation; realtime speech-driven facial animation synthesis; trajectory-based approach; virtual face animating process; Facial animation; Principal component analysis; Shape; Training; Vectors; Visualization; Facial animation; GMM; audio-to-visual conversion; speech-driven;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Multimedia and Expo Workshops (ICMEW), 2014 IEEE International Conference on
  • Conference_Location
    Chengdu
  • ISSN
    1945-7871
  • Type

    conf

  • DOI
    10.1109/ICMEW.2014.6890554
  • Filename
    6890554