Speech-rate-variable HMM-based Japanese TTS system

Author

Iwano, K. ; Yamada, Makoto ; Togawa, T. ; Furui, S.

Author_Institution

Tokyo Institute of Technology

fYear

2002

fDate

11-13 Sept. 2002

Firstpage

219

Lastpage

222

Abstract

This paper proposes a new method for controlling phoneme duration according to arbitrary target speech rate in speech synthesis (TTS, text-to-speech) systems. The proposed method first constructs three fundamental duration models at "fast", "normal", and "slow" speech rates using Hayashi\´s quantification theory (type 1) based on real speech databases and creates a duration model according to a target speech rate by interpolating the fundamental models. Our TTS system uses an HMM-based synthesizer which can achieve flexible prosody control. Various speech synthesized by the proposed method is evaluated by subjective experiments at four speech rates using pair comparison tests between the proposed method and a rule-based method. The results show that the proposed method achieves higher naturalness in synthesized speech than the rule-based method.

Keywords

hidden Markov models; natural languages; speech processing; speech synthesis; HMM-based synthesizer; Hayashi quantification theory; Japanese language; TTS; arbitrary target speech rate; duration models; flexible prosody control; interpolation; naturalness; phoneme duration control; real speech databases; speech synthesis; text-to-speech systems; Aging; Computer science; Control system synthesis; Databases; Hidden Markov models; Speech analysis; Speech synthesis; Synthesizers; Testing; Vocoders;

fLanguage

English

Publisher

ieee

Conference_Titel

Speech Synthesis, 2002. Proceedings of 2002 IEEE Workshop on

Print_ISBN

0-7803-7395-2

Type

conf

DOI

10.1109/WSS.2002.1224413

Filename

1224413