DocumentCode
134222
Title
Superpositional HMM-based intonation synthesis using a functional F0 model
Author
Jinfu Ni ; Shiga, Yoshinori ; Hori, Chiori
Author_Institution
Spoken Language Commun. Lab., Universal Commun. Res. Inst., Kyoto, Japan
fYear
2014
fDate
12-14 Sept. 2014
Firstpage
270
Lastpage
274
Abstract
This paper addresses intonation synthesis combining statistical and functional approach with manipulation of fundamental frequency (F0) contours in HMM-based speech synthesis. An F0 contour is represented as a sum of micro, accent, and register components at the logarithmic scale, which is rooted in the Fujisaki model. Separated context-dependent (CD) HMMs are trained for each type of components extracted from a speech corpus based on a functional F0 model. At the phase of synthesis, CDHMM-generated micro, accent, and register components are superimposed to form F0 contours for input text. Objective and subjective evaluations are carried out on a Japanese speech corpus. Compared with the conventional approach, this method not only demonstrates the improved performance in naturalness of synthetic speech by achieving better global F0 behaviors but also shows its flexibility for intonation manipulation through modifying the functional model parameters.
Keywords
hidden Markov models; natural language processing; speech processing; Fujisaki model; HMM-based speech synthesis; Japanese speech corpus; context-dependent HMM; functional F0 model; functional model parameters; fundamental frequency; intonation manipulation; superpositional HMM-based intonation synthesis; Correlation; Frequency synthesizers; Hidden Markov models; Registers; Speech; Speech synthesis; Training; HMM-based speech synthesis; Intonation synthesis; functional F0 model; making focal prominence; prosody;
fLanguage
English
Publisher
ieee
Conference_Titel
Chinese Spoken Language Processing (ISCSLP), 2014 9th International Symposium on
Conference_Location
Singapore
Type
conf
DOI
10.1109/ISCSLP.2014.6936614
Filename
6936614
Link To Document