Title :
Improved minimum converted trajectory error training for real-time speech-to-lips conversion
Author :
Han, Wei ; Wang, Lijuan ; Soong, Frank ; Yuan, Bo
Abstract :
Gaussian mixture model (GMM) based speech-to-lips conversion often operates in two alternative ways: batch conversion and sliding window-based conversion for real-time processing. Previously, Minimum Converted Trajectory Error (MCTE) training has been proposed to improve the performance of batch conversion. In this paper, we extend previous work and propose a new training criteria, MCTE for Real-time conversion (R-MCTE), to explicitly optimize the quality of sliding window-based conversion. In R-MCTE, we use the probabilistic descent method to refine model parameters by minimizing the error on real-time converted visual trajectories over training data. Objective evaluations on the LIPS 2008 Visual Speech Synthesis Challenge data set shows that the proposed method achieves both good lip animation performance and low delay in real-time conversion.
Keywords :
computer animation; learning (artificial intelligence); probability; speech processing; Gaussian mixture model; LIPS 2008 Visual Speech Synthesis Challenge data set; MCTE; batch conversion; lip animation performance; minimum converted trajectory error training; model parameters; objective evaluations; probabilistic descent method; real-time conversion; real-time converted visual trajectories; real-time processing; real-time speech-to-lips conversion; sliding window-based conversion; training data; Hidden Markov models; Real time systems; Speech; Training; Training data; Trajectory; Visualization; minimum converted trajectory error; real-time conversion; speech-to-lips;
Conference_Titel :
Acoustics, Speech and Signal Processing (ICASSP), 2012 IEEE International Conference on
Conference_Location :
Kyoto
Print_ISBN :
978-1-4673-0045-2
Electronic_ISBN :
1520-6149
DOI :
10.1109/ICASSP.2012.6288921