DocumentCode
1059792
Title
Analysis of Speaker Adaptation Algorithms for HMM-Based Speech Synthesis and a Constrained SMAPLR Adaptation Algorithm
Author
Yamagishi, Junichi ; Kobayashi, Takao ; Nakano, Yuji ; Ogata, Katsumi ; Isogai, Juri
Author_Institution
Center for Speech Technol. Res., Univ. of Edinburgh, Edinburgh
Volume
17
Issue
1
fYear
2009
Firstpage
66
Lastpage
83
Abstract
In this paper, we analyze the effects of several factors and configuration choices encountered during training and model construction when we want to obtain better and more stable adaptation in HMM-based speech synthesis. We then propose a new adaptation algorithm called constrained structural maximum a posteriori linear regression (CSMAPLR) whose derivation is based on the knowledge obtained in this analysis and on the results of comparing several conventional adaptation algorithms. Here, we investigate six major aspects of the speaker adaptation: initial models; the amount of the training data for the initial models; the transform functions, estimation criteria, and sensitivity of several linear regression adaptation algorithms; and combination algorithms. Analyzing the effect of the initial model, we compare speaker-dependent models, gender-independent models, and the simultaneous use of the gender-dependent models to single use of the gender-dependent models. Analyzing the effect of the transform functions, we compare the transform function for only mean vectors with that for mean vectors and covariance matrices. Analyzing the effect of the estimation criteria, we compare the ML criterion with a robust estimation criterion called structural MAP. We evaluate the sensitivity of several thresholds for the piecewise linear regression algorithms and take up methods combining MAP adaptation with the linear regression algorithms. We incorporate these adaptation algorithms into our speech synthesis system and present several subjective and objective evaluation results showing the utility and effectiveness of these algorithms in speaker adaptation for HMM-based speech synthesis.
Keywords
hidden Markov models; regression analysis; speech synthesis; constrained structural maximum a posteriori linear regression; estimation criteria; gender-independent models; hidden Markov models; model construction; regression adaptation algorithms; speaker adaptation algorithms; speaker-dependent models; speech synthesis; transform functions; Adaptation model; Algorithm design and analysis; Covariance matrix; Hidden Markov models; Linear regression; Maximum likelihood estimation; Speech analysis; Speech synthesis; Training data; Vectors; Average voice; hidden Markov model (HMM)-based speech synthesis; speaker adaptation; speech synthesis; voice conversion;
fLanguage
English
Journal_Title
Audio, Speech, and Language Processing, IEEE Transactions on
Publisher
ieee
ISSN
1558-7916
Type
jour
DOI
10.1109/TASL.2008.2006647
Filename
4740153
Link To Document