مرکز منطقه ای اطلاع رساني علوم و فناوري - Product of Experts for Statistical Parametric Speech Synthesis

DocumentCode :

1316401

Title :

Product of Experts for Statistical Parametric Speech Synthesis

Author :

Zen, Heiga ; Gales, Mark J F ; Nankaku, Yoshihiko ; Tokuda, Keiichi

Author_Institution :

Nagoya Inst. of Technol., Nagoya, Japan

Volume :

Issue :

fYear :

2012

fDate :

3/1/2012 12:00:00 AM

Firstpage :

794

Lastpage :

805

Abstract :

Multiple acoustic models are often combined in statistical parametric speech synthesis. Both linear and non-linear functions of an observation sequence are used as features to be modeled. This paper shows that this combination of multiple acoustic models can be expressed as a product of experts (PoE); the likelihoods from the models are scaled, multiplied together, and then normalized. Normally these models are individually trained and only combined at the synthesis stage. This paper discusses a more consistent PoE framework where the models are jointly trained. A training algorithm for PoEs based on linear feature functions and Gaussian experts is derived by generalizing the training algorithm for trajectory HMMs. However for non-linear feature functions or non-Gaussian experts this is not possible, so a scheme based on contrastive divergence learning is described. Experimental results show that the PoE framework provides both a mathematically elegant way to train multiple acoustic models jointly and significant improvements in the quality of the synthesized speech.

Keywords :

acoustic signal processing; hidden Markov models; speech synthesis; PoE framework; acoustic model; contrastive divergence learning; nonGaussian expert; nonlinear feature function; product of experts; statistical parametric speech synthesis; training algorithm; trajectory HMM; Acoustics; Adaptation models; Hidden Markov models; Speech; Speech synthesis; Trajectory; Product of experts (PoE); statistical parametric speech synthesis; trajectory hidden Markov model (HMM);

fLanguage :

English

Journal_Title :

Audio, Speech, and Language Processing, IEEE Transactions on

Publisher :

ieee

ISSN :

1558-7916

Type :

jour

DOI :

10.1109/TASL.2011.2165280

Filename :

6012516

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=1316401