DocumentCode :
1010128
Title :
Toward spontaneous speech Synthesis-utilizing language model information in TTS
Author :
Werner, Steffen ; Eichner, Matthias ; Wolff, Matthias ; Hoffmann, Ruediger
Author_Institution :
Tech. Univ. Dresden, Germany
Volume :
12
Issue :
4
fYear :
2004
fDate :
7/1/2004 12:00:00 AM
Firstpage :
436
Lastpage :
445
Abstract :
State-of-the-art speech synthesis systems achieve a high overall quality. However, synthesized speech still lacks naturalness. To produce more natural and colloquial synthetic speech, our research focuses on integration of effects present in spontaneous speech. Conventional speech synthesis systems do not consider the probability of a word in its context. Recent investigations on corpora of natural speech showed that words that are very likely to occur in a given context are pronounced less accurately and faster than improbable ones. In this paper three approaches are introduced to model this effect found in spontaneous speech. The first algorithm changes the speaking rate directly by shortening or lengthening the syllables of a word depending on the language model probability of that word. Since probable words are not only pronounced faster but also less accurately this approach was extended by selecting appropriate pronunciation variants of a word according to the language model probability. This second algorithm changes the local speaking rate indirectly by controlling the grapheme-phoneme conversion. In a third stage, a pronunciation sequence model was used to select the appropriate variants according to their sequence probability. In listening experiments test participants were asked to rate the synthesized speech in the categories colloquial impression and naturalness. Our approaches achieved a significant improvement in the category colloquial impression. However, no significantly higher naturalness could be observed. The observed effects will be discussed in detail.
Keywords :
natural languages; probability; speech synthesis; text analysis; TTS; colloquial impression; colloquial synthetic speech; grapheme-phoneme conversion; language model information; language model probability; natural speech corpora; natural synthetic speech; sequence probability; speaking rate; speech synthesis systems; spontaneous speech synthesis; Control system synthesis; Helium; Natural languages; Predictive models; Speech processing; Speech synthesis; Testing;
fLanguage :
English
Journal_Title :
Speech and Audio Processing, IEEE Transactions on
Publisher :
ieee
ISSN :
1063-6676
Type :
jour
DOI :
10.1109/TSA.2004.828635
Filename :
1306516
Link To Document :
بازگشت