DocumentCode
1937786
Title
Speech-rate-variable HMM-based Japanese TTS system
Author
Iwano, K. ; Yamada, Makoto ; Togawa, T. ; Furui, S.
Author_Institution
Tokyo Institute of Technology
fYear
2002
fDate
11-13 Sept. 2002
Firstpage
219
Lastpage
222
Abstract
This paper proposes a new method for controlling phoneme duration according to arbitrary target speech rate in speech synthesis (TTS, text-to-speech) systems. The proposed method first constructs three fundamental duration models at "fast", "normal", and "slow" speech rates using Hayashi\´s quantification theory (type 1) based on real speech databases and creates a duration model according to a target speech rate by interpolating the fundamental models. Our TTS system uses an HMM-based synthesizer which can achieve flexible prosody control. Various speech synthesized by the proposed method is evaluated by subjective experiments at four speech rates using pair comparison tests between the proposed method and a rule-based method. The results show that the proposed method achieves higher naturalness in synthesized speech than the rule-based method.
Keywords
hidden Markov models; natural languages; speech processing; speech synthesis; HMM-based synthesizer; Hayashi quantification theory; Japanese language; TTS; arbitrary target speech rate; duration models; flexible prosody control; interpolation; naturalness; phoneme duration control; real speech databases; speech synthesis; text-to-speech systems; Aging; Computer science; Control system synthesis; Databases; Hidden Markov models; Speech analysis; Speech synthesis; Synthesizers; Testing; Vocoders;
fLanguage
English
Publisher
ieee
Conference_Titel
Speech Synthesis, 2002. Proceedings of 2002 IEEE Workshop on
Print_ISBN
0-7803-7395-2
Type
conf
DOI
10.1109/WSS.2002.1224413
Filename
1224413
Link To Document