Title :
Parameter selection for prosodic modelling in a restricted-domain spanish text-to-speech system
Author :
Montero, J.M. ; de Cordoba, R. ; Macias-Guarasa, Javier ; San-Segundo, R. ; Gutierrez-Arriola, J. ; Pardo, J.M.
Author_Institution :
Speech Technology Group, Electronic Engineering Dept., Universidad Politecnica de Madrid, E.T.S.I. Telecomunicacion, Ciudad Universitaria, 28040-Madrid, Spain
fDate :
June 28 2004-July 1 2004
Abstract :
The prosodic modeling is one of the most important tasks for developing a new text-to-speech synthesizer, especially in a female-voice high-quality restricted-domain system. Our double objective is to get accurate predictors for both the F0 curve and phoneme duration by minimizing the model estimation error in a Spanish text-to-speech system. To achieve these complementary aims we needed to find the factors that most influence prosodic values in a given language. We have used neural networks and experimented with the different combinations of parameters that yield the minimum error in the estimation. In the restricted-domain environment the variation in the different patterns is reduced, and there are more instances of each parameter vector in the database. This way, the neural network proves to be an excellent tool for the modeling. The resulting system predicts prosody with very good results (for duration: 15.5 in in RMS and a correlation factor of 0.8975; for F0: 19.80 Hz in RMS and a relative RMS error of 0.43) that clearly improves our previous rule-based system.
Keywords :
Artificial neural networks; Databases; Decision trees; Knowledge based systems; Natural languages; Neural networks; Speech synthesis; Telecommunications; Timing; F0 modeling; Prosody; artificial neural networks; duration modeling; parameter coding; parameter selection; text-to-speech;
Conference_Titel :
Automation Congress, 2004. Proceedings. World
Conference_Location :
Seville
Print_ISBN :
1-889335-21-5