مرکز منطقه ای اطلاع رساني علوم و فناوري - The IBM expressive text-to-speech synthesis system for American English

DocumentCode :

980715

Title :

The IBM expressive text-to-speech synthesis system for American English

Author :

Pitrelli, John F. ; Bakis, Raimo ; Eide, Ellen M. ; Fernandez, Raul ; Hamza, Wael ; Picheny, Michael A.

Author_Institution :

IBM T. J. Watson Res. Center, Yorktown Heights, NY

Volume :

Issue :

fYear :

2006

fDate :

7/1/2006 12:00:00 AM

Firstpage :

1099

Lastpage :

1108

Abstract :

Expressive text-to-speech (TTS) synthesis should contribute to the pleasantness, intelligibility, and speed of speech-based human-machine interactions which use TTS. We describe a TTS engine which can be directed, via text markup, to use a variety of expressive styles, here, questioning, contrastive emphasis, and conveying good and bad news. Differences in these styles lead us to investigate two approaches for expressive TTS, a "corpus-driven" and a "prosodic-phonology" approach. Each speaker records 11 h (excluding silences) of "neutral" sentences. In the corpus-driven approach, the speaker also records 1-h corpora in each expressive style; these segments are tagged by style for use during search, and decision trees for determining f₀ contours and timing are trained separately for each of the neutral and expressive corpora. In the prosodic-phonology approach, rules translating certain expressive markup elements to tones and break indices (ToBI) are manually determined, and the ToBI elements are used in single f₀ and duration trees for all expressions. Tests show that listeners identify synthesis in particular styles ranging from 70% correctly for "conveying bad news" to 85% for "yes-no questions". Further improvements are demonstrated through the use of speaker-pooled f₀ and duration models

Keywords :

decision trees; linguistics; speech synthesis; American English; IBM; corpus-driven approach; decision trees; prosodic-phonology approach; text-to-speech synthesis system; tones and break indices; Bandwidth; Decision trees; Engines; Humans; Information systems; Marketing and sales; Mood; Speech synthesis; Testing; Timing; Corpus-driven text-to-speech (TTS); expressive speech synthesis; prosodic phonology; text-to-speech (TTS); tones and break indices (ToBI);

fLanguage :

English

Journal_Title :

Audio, Speech, and Language Processing, IEEE Transactions on

Publisher :

ieee

ISSN :

1558-7916

Type :

jour

DOI :

10.1109/TASL.2006.876123

Filename :

1643639

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=980715