Title : 
Product of Experts for Statistical Parametric Speech Synthesis
         
        
            Author : 
Zen, Heiga ; Gales, Mark J F ; Nankaku, Yoshihiko ; Tokuda, Keiichi
         
        
            Author_Institution : 
Nagoya Inst. of Technol., Nagoya, Japan
         
        
        
        
        
            fDate : 
3/1/2012 12:00:00 AM
         
        
        
        
            Abstract : 
Multiple acoustic models are often combined in statistical parametric speech synthesis. Both linear and non-linear functions of an observation sequence are used as features to be modeled. This paper shows that this combination of multiple acoustic models can be expressed as a product of experts (PoE); the likelihoods from the models are scaled, multiplied together, and then normalized. Normally these models are individually trained and only combined at the synthesis stage. This paper discusses a more consistent PoE framework where the models are jointly trained. A training algorithm for PoEs based on linear feature functions and Gaussian experts is derived by generalizing the training algorithm for trajectory HMMs. However for non-linear feature functions or non-Gaussian experts this is not possible, so a scheme based on contrastive divergence learning is described. Experimental results show that the PoE framework provides both a mathematically elegant way to train multiple acoustic models jointly and significant improvements in the quality of the synthesized speech.
         
        
            Keywords : 
acoustic signal processing; hidden Markov models; speech synthesis; PoE framework; acoustic model; contrastive divergence learning; nonGaussian expert; nonlinear feature function; product of experts; statistical parametric speech synthesis; training algorithm; trajectory HMM; Acoustics; Adaptation models; Hidden Markov models; Speech; Speech synthesis; Trajectory; Product of experts (PoE); statistical parametric speech synthesis; trajectory hidden Markov model (HMM);
         
        
        
            Journal_Title : 
Audio, Speech, and Language Processing, IEEE Transactions on
         
        
        
        
        
            DOI : 
10.1109/TASL.2011.2165280