DocumentCode :
1413600
Title :
Minimum Kullback–Leibler Divergence Parameter Generation for HMM-Based Speech Synthesis
Author :
Ling, Zhen-Hua ; Dai, Li-Rong
Author_Institution :
iFLYTEK Speech Lab., Univ. of Sci. & Technol. of China, Hefei, China
Volume :
20
Issue :
5
fYear :
2012
fDate :
7/1/2012 12:00:00 AM
Firstpage :
1492
Lastpage :
1502
Abstract :
This paper presents a parameter generation method for hidden Markov model (HMM)-based statistical parametric speech synthesis that uses a similarity measure for probability distributions. In contrast to conventional maximum output probability parameter generation (MOPPG), the method we propose derives a parameter generation criterion from the distribution characteristics of the generated acoustic features. Kullback-Leibler (KL) divergence between the sentence HMM used for parameter generation and the HMM estimated from the generated features is calculated by upper bound approximation. During parameter generation, this KL divergence is minimized either by optimizing the generated acoustic parameters directly or by applying a linear transform to the MOPPG outputs. Our experiments show both these approaches are effective for alleviating over-smoothing in the generated spectral features and for improving the naturalness of synthetic speech. Compared with the direct optimization approach, which is susceptible to over-fitting, the feature transform approach gives better performance. In order to reduce the computational complexity of transform estimation, an offline training method is further developed to estimate a global transform under the minimum KL divergence criterion for the training set. Experimental results show that this global transform is as effective as the transform estimated for each sentence at synthesis stage.
Keywords :
approximation theory; hidden Markov models; optimisation; speech synthesis; statistical distributions; wavelet transforms; HMM; KL divergence criterion; Kullback-Leibler divergence; MOPPG method; alleviating oversmoothing; feature transform approach; global transform; hidden Markov model; linear transform; maximum output probability parameter generation; offline training method; optimization; probability distributions; similarity measure; speech synthesis; statistical parameter; upper bound approximation; Acoustics; Context modeling; Estimation; Hidden Markov models; Speech; Training; Transforms; Hidden Markov model (HMM); Kullback– Leibler (KL) divergence; parameter generation; speech synthesis;
fLanguage :
English
Journal_Title :
Audio, Speech, and Language Processing, IEEE Transactions on
Publisher :
ieee
ISSN :
1558-7916
Type :
jour
DOI :
10.1109/TASL.2011.2182511
Filename :
6121942
Link To Document :
بازگشت