DocumentCode :
3527055
Title :
Improved prosody generation by maximizing joint likelihood of state and longer units
Author :
Qian, Yao ; Wu, Zhizheng ; Soong, Frank K.
Author_Institution :
Microsoft Res. Asia, Beijing
fYear :
2009
fDate :
19-24 April 2009
Firstpage :
3781
Lastpage :
3784
Abstract :
The current state-of-art HMM-bsed TTS can produce highly intelligible output speech and deliver a decent segmental quality. However, its prosody, especially at the phrase or sentence level, tends to be bland. The blandness of synthesized prosody is partially due to the fact that a state-based HMM is rather inadequate in modeling a global, hierarchical prosodic structure at a sentence or phrase level. In this study, the prosody of longer units are first modeled explicitly by appropriate parametric distributions. The resultant models are then integrated with the state-level baseline models to generate an optimal prosody by maximizing the joint likelihood of all, from state to longer, units. Experimental results in both Mandarin and English show consistent improvements over the state-based baseline system. The improvements are both objectively measurable and subjectively perceivable.
Keywords :
Markov processes; speech intelligibility; speech synthesis; duration modelling; hidden Markov models; parametric distributions; pitch modelling; prosody generation; segmental quality; speech intelligibility; Asia; Covariance matrix; Degradation; Discrete cosine transforms; Educational institutions; Frequency; Gaussian distribution; Hidden Markov models; Software quality; Speech synthesis; DCT; Duration modeling; Gamma distribution; HMM-based TTS; Pitch Modeling;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Acoustics, Speech and Signal Processing, 2009. ICASSP 2009. IEEE International Conference on
Conference_Location :
Taipei
ISSN :
1520-6149
Print_ISBN :
978-1-4244-2353-8
Electronic_ISBN :
1520-6149
Type :
conf
DOI :
10.1109/ICASSP.2009.4960450
Filename :
4960450
Link To Document :
بازگشت