Title :
LIP movement generation using restricted Boltzmann machines for visual speech synthesis
Author :
Zheng-Chen Liu ; Zhen-Hua Ling ; Li-Rong Dai
Author_Institution :
Nat. Eng. Lab. for Speech & Language Inf. Process., Univ. of Sci. & Technol. of China, Hefei, China
Abstract :
This paper proposes methods of using restricted Boltzmann machines (RBM) to generate the sequence of lip images for visual speech synthesis. The aim of our proposed methods is to alleviate the over-smoothing effect of the conventional hidden Markov model (HMM) based statistical approach for lip synthesis. Two model structures using RBMs to model and generate lip movements are investigated in this paper. First, RBMs are adopted to replace Gaussian distributions as the density functions of HMM states. Second, a deep belief network (DBN) is constructed by stacking up multiple RBMs to model the joint distribution between the lip image of each frame and its corresponding context features. Experimental results show that our proposed methods can improve the quality of generated lip images significantly. The method of using DBN model structure and raw pixel features achieves the best performance in our experiments.
Keywords :
Boltzmann machines; belief networks; hidden Markov models; image sequences; speech synthesis; DBN; HMM; RBM; deep belief network; hidden Markov model; lip image sequence; lip movement generation; restricted Boltzmann machine; visual speech synthesis; Feature extraction; Hidden Markov models; Principal component analysis; Speech; Speech synthesis; Training; Visualization; deep belief network; hidden Markov model; restricted Boltzmann machine; visual speech synthesis;
Conference_Titel :
Signal and Information Processing (ChinaSIP), 2015 IEEE China Summit and International Conference on
Conference_Location :
Chengdu
DOI :
10.1109/ChinaSIP.2015.7230475