DocumentCode :
1511068
Title :
Vocal Tract Length Normalization for Statistical Parametric Speech Synthesis
Author :
Saheer, Lakshmi ; Dines, John ; Garner, Philip N.
Author_Institution :
Idiap Res. Inst., Martigny, Switzerland
Volume :
20
Issue :
7
fYear :
2012
Firstpage :
2134
Lastpage :
2148
Abstract :
Vocal tract length normalization (VTLN) has been successfully used in automatic speech recognition for improved performance. The same technique can be implemented in statistical parametric speech synthesis for rapid speaker adaptation during synthesis. This paper presents an efficient implementation of VTLN using expectation maximization and addresses the key challenges faced in implementing VTLN for synthesis. Jacobian normalization, high-dimensionality features and truncation of the transformation matrix are a few challenges presented with the appropriate solutions. Detailed evaluations are performed to estimate the most suitable technique for using VTLN in speech synthesis. Evaluating VTLN in the framework of speech synthesis is also not an easy task since the technique does not work equally well for all speakers. Speakers have been selected based on different objective and subjective criteria to demonstrate the difference between systems. The best method for implementing VTLN is confirmed to be use of the lower order features for estimating warping factors.
Keywords :
Jacobian matrices; speaker recognition; speech synthesis; statistical analysis; Jacobian normalization; VTLN; automatic speech recognition; expectation maximization; rapid speaker adaptation; statistical parametric speech synthesis; transformation matrix; vocal tract length normalization; Feature extraction; Hidden Markov models; Jacobian matrices; Maximum likelihood estimation; Mel frequency cepstral coefficient; Speech synthesis; Transforms; Expectation–maximization optimization; hidden Markov model (HMM)-based statistical parametric speech synthesis; speaker adaptation; vocal tract length normalization;
fLanguage :
English
Journal_Title :
Audio, Speech, and Language Processing, IEEE Transactions on
Publisher :
ieee
ISSN :
1558-7916
Type :
jour
DOI :
10.1109/TASL.2012.2198058
Filename :
6196182
Link To Document :
بازگشت