Title :
Use of fundamental frequencies shaped by generation process model for HMM-based speech synthesis
Author :
Hirose, Keikichi ; Hashimoto, Hiroya ; Hyakutake, Kyota ; Saito, Daisuke ; Minematsu, Nobuaki
Author_Institution :
Dept. of Inf. & Commun. Eng., Univ. of Tokyo, Tokyo, Japan
Abstract :
Generation process model of fundamental frequency (F0) contours is known to represent global movements of F0´s keeping a clear relation with linguistic information of utterances. While HMM-based speech synthesis can generate a good quality of speech, problems, which arise from frame-by-frame processing, are pointed out. These problems are expected to be solved by incorporating the model constraints. A method is developed to use F0 contours approximated by the model for HMM training instead of observed F0 contours. A clear improvement in the quality of synthetic speech is shown through listening experiments. In the method, fragments of F0 contours not represented by the model (F0 residuals) are ignored. A scheme is further introduced to cope with the issue; F0 residuals are also included in the training and synthesis processes of HMM-based speech synthesis, and the generated F0 residuals are added to the model-based F0´s before the waveform generation. The model constraint has another merit; relations between generated F0 contours and texts are clear, and it is possible to add linguistic information such as emphasis to synthetic speech, or to change speaking styles through manipulating F0´s in the F0 model framework. Several experimental results supporting the advantages of the method are shown.
Keywords :
hidden Markov models; speech synthesis; HMM training; HMM-based speech synthesis; frame-by-frame processing; fundamental frequency contour generation process model; hidden Markov model; listening experiment; speaking style change; synthetic speech quality; utterance linguistic information; waveform generation; Feature extraction; Hidden Markov models; Pragmatics; Speech; Speech synthesis; Training; F0 residual; Flexible F0 control; Generation process model; HMM-based speech synthesis;
Conference_Titel :
Signal Processing (ICSP), 2014 12th International Conference on
Conference_Location :
Hangzhou
Print_ISBN :
978-1-4799-2188-1
DOI :
10.1109/ICOSP.2014.7015066