مرکز منطقه ای اطلاع رساني علوم و فناوري - Minimum Kullback–Leibler Divergence Parameter Generation for HMM-Based Speech Synthesis

DocumentCode :

1413600

Title :

Minimum Kullback–Leibler Divergence Parameter Generation for HMM-Based Speech Synthesis

Author :

Ling, Zhen-Hua ; Dai, Li-Rong

Author_Institution :

iFLYTEK Speech Lab., Univ. of Sci. & Technol. of China, Hefei, China

Volume :

Issue :

fYear :

2012

fDate :

7/1/2012 12:00:00 AM

Firstpage :

1492

Lastpage :

1502

Abstract :

This paper presents a parameter generation method for hidden Markov model (HMM)-based statistical parametric speech synthesis that uses a similarity measure for probability distributions. In contrast to conventional maximum output probability parameter generation (MOPPG), the method we propose derives a parameter generation criterion from the distribution characteristics of the generated acoustic features. Kullback-Leibler (KL) divergence between the sentence HMM used for parameter generation and the HMM estimated from the generated features is calculated by upper bound approximation. During parameter generation, this KL divergence is minimized either by optimizing the generated acoustic parameters directly or by applying a linear transform to the MOPPG outputs. Our experiments show both these approaches are effective for alleviating over-smoothing in the generated spectral features and for improving the naturalness of synthetic speech. Compared with the direct optimization approach, which is susceptible to over-fitting, the feature transform approach gives better performance. In order to reduce the computational complexity of transform estimation, an offline training method is further developed to estimate a global transform under the minimum KL divergence criterion for the training set. Experimental results show that this global transform is as effective as the transform estimated for each sentence at synthesis stage.

Keywords :

approximation theory; hidden Markov models; optimisation; speech synthesis; statistical distributions; wavelet transforms; HMM; KL divergence criterion; Kullback-Leibler divergence; MOPPG method; alleviating oversmoothing; feature transform approach; global transform; hidden Markov model; linear transform; maximum output probability parameter generation; offline training method; optimization; probability distributions; similarity measure; speech synthesis; statistical parameter; upper bound approximation; Acoustics; Context modeling; Estimation; Hidden Markov models; Speech; Training; Transforms; Hidden Markov model (HMM); Kullback– Leibler (KL) divergence; parameter generation; speech synthesis;

fLanguage :

English

Journal_Title :

Audio, Speech, and Language Processing, IEEE Transactions on

Publisher :

ieee

ISSN :

1558-7916

Type :

jour

DOI :

10.1109/TASL.2011.2182511

Filename :

6121942

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=1413600