Title :
Representing fundamental frequency contours generated by HMM-based speech synthesis using generation process model
Author :
Hirose, Keikichi ; Matsuda, Tatsuya ; Hashimoto, Hiroya ; Minematsu, Nobuaki
Author_Institution :
Dept. of Inf. & Commun. Eng., Univ. of Tokyo, Tokyo, Japan
Abstract :
Frame-by-frame representation is not appropriate for prosodic features, which are tightly related to speech units spreading a wide time span, such as words, phrases and so on. This causes an inherit problem in fundamental frequency (F0) contour generation by HMM-based speech synthesis. A method is developed to modify F0 contours in the framework of a generation process model by referring to linguistic information of input text (word boundary and accent type). It takes F0 variances obtained through HMM-based speech synthesis into account during the process. Through a listening experiment on synthetic speech, the method is proved to generate better quality as compared to the HMM-based speech synthesis on average. Since the generation process model can clearly relate its commands and linguistic (and para-/non- linguistic) information, the method has an additional advantage; changing speech styles, and /or adding further information (such as emphasis) can be easily done through manipulating the commands.
Keywords :
hidden Markov models; speech synthesis; HMM; accent type; command manipulation; fundamental frequency contour generation; fundamental frequency contour representation; generation process model; speech synthesis; word boundary; Frequency synthesizers; Hidden Markov models; Mathematical model; Pragmatics; Speech; Speech synthesis; HMM-based speech synthesis; flexible control; fundamental frequency contour; generation process model; linguistic information;
Conference_Titel :
Machine Learning for Signal Processing (MLSP), 2011 IEEE International Workshop on
Conference_Location :
Santander
Print_ISBN :
978-1-4577-1621-8
Electronic_ISBN :
1551-2541
DOI :
10.1109/MLSP.2011.6064596