• DocumentCode
    231571
  • Title

    Use of fundamental frequencies shaped by generation process model for HMM-based speech synthesis

  • Author

    Hirose, Keikichi ; Hashimoto, Hiroya ; Hyakutake, Kyota ; Saito, Daisuke ; Minematsu, Nobuaki

  • Author_Institution
    Dept. of Inf. & Commun. Eng., Univ. of Tokyo, Tokyo, Japan
  • fYear
    2014
  • fDate
    19-23 Oct. 2014
  • Firstpage
    555
  • Lastpage
    560
  • Abstract
    Generation process model of fundamental frequency (F0) contours is known to represent global movements of F0´s keeping a clear relation with linguistic information of utterances. While HMM-based speech synthesis can generate a good quality of speech, problems, which arise from frame-by-frame processing, are pointed out. These problems are expected to be solved by incorporating the model constraints. A method is developed to use F0 contours approximated by the model for HMM training instead of observed F0 contours. A clear improvement in the quality of synthetic speech is shown through listening experiments. In the method, fragments of F0 contours not represented by the model (F0 residuals) are ignored. A scheme is further introduced to cope with the issue; F0 residuals are also included in the training and synthesis processes of HMM-based speech synthesis, and the generated F0 residuals are added to the model-based F0´s before the waveform generation. The model constraint has another merit; relations between generated F0 contours and texts are clear, and it is possible to add linguistic information such as emphasis to synthetic speech, or to change speaking styles through manipulating F0´s in the F0 model framework. Several experimental results supporting the advantages of the method are shown.
  • Keywords
    hidden Markov models; speech synthesis; HMM training; HMM-based speech synthesis; frame-by-frame processing; fundamental frequency contour generation process model; hidden Markov model; listening experiment; speaking style change; synthetic speech quality; utterance linguistic information; waveform generation; Feature extraction; Hidden Markov models; Pragmatics; Speech; Speech synthesis; Training; F0 residual; Flexible F0 control; Generation process model; HMM-based speech synthesis;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Signal Processing (ICSP), 2014 12th International Conference on
  • Conference_Location
    Hangzhou
  • ISSN
    2164-5221
  • Print_ISBN
    978-1-4799-2188-1
  • Type

    conf

  • DOI
    10.1109/ICOSP.2014.7015066
  • Filename
    7015066