Title :
Modeling pitch trajectory by hierarchical HMM with minimum generation error training
Author :
Wu, Yi-Jian ; Soong, Frank
Author_Institution :
Speech Group, Microsoft Res. Asia, Beijing, China
Abstract :
A hierarchical pitch model (HPM) was recently proposed to HMM-based speech synthesis. In HPM, pitch trajectory is modeled as an additive combination of hierarchical layers (including state, phone, syllable, etc), and a minimum generation error (MGE) criterion is used to re-estimate model parameters. In this paper, we extend the MGE criterion to a tree-based model clustering process to simultaneously cluster the context-dependent models at all layers, and construct a full MGE training process for HPM training. Experiments were conducted to investigate the effects of HPM with different training criteria and different hierarchical layer combinations. Experimental results show that the full MGE training can significantly improve HPM´s ability to predict F0 trajectory in TTS over the ML-based approach on test data. The new HPM also outperforms the conventional state-level HMM in F0 prediction.
Keywords :
hidden Markov models; pattern clustering; speech synthesis; trees (mathematics); F0 trajectory; HMM-based speech synthesis; HPM training; MGE criterion; MGE training process; ML-based approach; context-dependent models; hidden Markov models; hierarchical HMM; hierarchical layers; minimum generation error criterion; minimum generation error training; pitch trajectory modeling; state-level HMM; test data; tree-based model clustering process; Additives; Context modeling; Hidden Markov models; Speech synthesis; Training; Trajectory; Vectors; Speech synthesis; hidden Markov model; hierarchical pitch model; minimum generation error;
Conference_Titel :
Acoustics, Speech and Signal Processing (ICASSP), 2012 IEEE International Conference on
Conference_Location :
Kyoto
Print_ISBN :
978-1-4673-0045-2
Electronic_ISBN :
1520-6149
DOI :
10.1109/ICASSP.2012.6288799