DocumentCode :
2798955
Title :
Improved modeling for F0 generation and V/U decision in HMM-based TTS
Author :
Zhang, Qingqing ; Soong, Frank ; Qian, Yao ; Yan, Zhijie ; Pan, Jielin ; Yan, Yonghong
fYear :
2010
fDate :
14-19 March 2010
Firstpage :
4606
Lastpage :
4609
Abstract :
The HMM-based TTS can produce a highly intelligible and decent quality voice. However, sometimes the synthesized speech exhibits perceptibly annoying glitches due to F0 extraction errors in the training data and voiced/unvoiced swapping errors in F0 generation. In the conventional MSD based F0 modeling [10], the dual but incompatible two probabilistic spaces, the continuous probability density for voiced observations or the discrete probability for unvoiced observations, prevent us from using likelihood based frame occupancy to alleviate the deteriorating effect of F0 extraction errors in training a more robust model for synthesis. In this paper, we propose a new approach to improved modeling the piece-wise continuous F0 trajectory and v/u decision for HMM-based TTS. Voicing strength, characterized by the normalized correlation coefficient magnitude calculated in F0 feature extraction, is used as an additional feature in F0 modeling and for v/u decision. Experimental results show the new approach to F0 modeling and generation outperforms MSD-HMM method and a newly proposed GTD-HMM method [9] significantly. The improvements are both objectively measurable and subjectively perceivable.
Keywords :
hidden Markov models; speech synthesis; statistical analysis; F0 extraction errors; F0 generation; HMM-based TTS; V/U decision; continuous probability density; discrete probability; hidden Markov model; likelihood based frame occupancy; piece-wise continuous F0 trajectory; text-to-speech synthesis; Acoustics; Asia; Degradation; Feature extraction; Hidden Markov models; Laboratories; Robustness; Speech analysis; Speech synthesis; Training data; F0 generation; HMM-based TTS; V/U decision model; voicing strength;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Acoustics Speech and Signal Processing (ICASSP), 2010 IEEE International Conference on
Conference_Location :
Dallas, TX
ISSN :
1520-6149
Print_ISBN :
978-1-4244-4295-9
Electronic_ISBN :
1520-6149
Type :
conf
DOI :
10.1109/ICASSP.2010.5495561
Filename :
5495561
Link To Document :
بازگشت