Title :
A training method for average voice model based on shared decision tree context clustering and speaker adaptive training
Author :
Yamagishi, Junichi ; Masuko, T. ; Tokuda, Keiichi ; Kobayashi, Takehiko
Author_Institution :
Interdisciplinary Graduate Sch. of Sci. & Eng., Tokyo Inst. of Technol., Japan
Abstract :
This paper describes a new training method of average voice model for speech synthesis in which an arbitrary speaker´s voice is generated based on speaker adaptation. When the amount of training data is limited, the distributions of average voice model often have bias depending on speaker and/or gender and this will degrade the quality of synthetic speech. In the proposed method, to reduce the influence of speaker dependence, we incorporate a context clustering technique called shared decision tree context clustering and speaker adaptive training into the training procedure of the average voice model. From the results of subjective tests, we show that the average voice model trained using the proposed method generates more natural sounding speech than the conventional average voice model. Moreover, it is shown that voice characteristics of synthetic speech generated from the adapted model using the proposed method are closer to the target speaker than the conventional method.
Keywords :
decision trees; pattern clustering; probability; speech intelligibility; speech synthesis; average voice model; natural sounding speech; shared decision tree context clustering; speaker adaptive training; speech synthesis; subjective tests; training method; Adaptation model; Context modeling; Databases; Decision trees; Degradation; Hidden Markov models; Loudspeakers; Maximum likelihood linear regression; Speech synthesis; Training data;
Conference_Titel :
Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03). 2003 IEEE International Conference on
Print_ISBN :
0-7803-7663-3
DOI :
10.1109/ICASSP.2003.1198881