مرکز منطقه ای اطلاع رساني علوم و فناوري - Toward spontaneous speech Synthesis-utilizing language model information in TTS

DocumentCode :

1010128

Title :

Toward spontaneous speech Synthesis-utilizing language model information in TTS

Author :

Werner, Steffen ; Eichner, Matthias ; Wolff, Matthias ; Hoffmann, Ruediger

Author_Institution :

Tech. Univ. Dresden, Germany

Volume :

Issue :

fYear :

2004

fDate :

7/1/2004 12:00:00 AM

Firstpage :

436

Lastpage :

445

Abstract :

State-of-the-art speech synthesis systems achieve a high overall quality. However, synthesized speech still lacks naturalness. To produce more natural and colloquial synthetic speech, our research focuses on integration of effects present in spontaneous speech. Conventional speech synthesis systems do not consider the probability of a word in its context. Recent investigations on corpora of natural speech showed that words that are very likely to occur in a given context are pronounced less accurately and faster than improbable ones. In this paper three approaches are introduced to model this effect found in spontaneous speech. The first algorithm changes the speaking rate directly by shortening or lengthening the syllables of a word depending on the language model probability of that word. Since probable words are not only pronounced faster but also less accurately this approach was extended by selecting appropriate pronunciation variants of a word according to the language model probability. This second algorithm changes the local speaking rate indirectly by controlling the grapheme-phoneme conversion. In a third stage, a pronunciation sequence model was used to select the appropriate variants according to their sequence probability. In listening experiments test participants were asked to rate the synthesized speech in the categories colloquial impression and naturalness. Our approaches achieved a significant improvement in the category colloquial impression. However, no significantly higher naturalness could be observed. The observed effects will be discussed in detail.

Keywords :

natural languages; probability; speech synthesis; text analysis; TTS; colloquial impression; colloquial synthetic speech; grapheme-phoneme conversion; language model information; language model probability; natural speech corpora; natural synthetic speech; sequence probability; speaking rate; speech synthesis systems; spontaneous speech synthesis; Control system synthesis; Helium; Natural languages; Predictive models; Speech processing; Speech synthesis; Testing;

fLanguage :

English

Journal_Title :

Speech and Audio Processing, IEEE Transactions on

Publisher :

ieee

ISSN :

1063-6676

Type :

jour

DOI :

10.1109/TSA.2004.828635

Filename :

1306516

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=1010128