مرکز منطقه ای اطلاع رساني علوم و فناوري - A set of corpus-based text-to-speech synthesis technologies for Mandarin Chinese

DocumentCode :

857225

Title :

A set of corpus-based text-to-speech synthesis technologies for Mandarin Chinese

Author :

Chou, Fu-Chiang ; Tseng, Chiu-Yu ; Lee, Lin-shan

Author_Institution :

Voice Control, Philips Speech Process., Taipei, Taiwan

Volume :

Issue :

fYear :

2002

fDate :

10/1/2002 12:00:00 AM

Firstpage :

481

Lastpage :

494

Abstract :

This paper presents a set of corpus-based text-to-speech synthesis technologies for Mandarin Chinese. A large speech corpus produced by a single speaker is used, and the speech output is. synthesized from waveform units of variable lengths, with desired linguistic properties, retrieved from this corpus. Detailed methodologies were developed for designing "phonetically rich" and "prosodically rich" corpora by automatically selecting sentences from a large text corpus to include as many desired phonetic combinations and prosodic features as possible. Automatic phonetic labeling with iterative correction rules and automatic prosodic labeling with a multi-pass top-down procedure were also developed such that the labeling process for the corpora can be completely automatic. A hierarchical prosodic structure for an arbitrary desired text sentence is then generated based on the identification of different levels of break indices, and the prosodic feature sets and appropriate waveform units are finally selected and retrieved from the corpus, modified if necessary, and concatenated to produce the output speech. The special structure of Mandarin Chinese has been carefully considered in all these technologies, and preliminary assessments indicated very encouraging synthesized speech quality.

Keywords :

natural languages; reviews; speech intelligibility; speech synthesis; Mandarin Chinese; automatic phonetic labeling; automatic prosodic labeling; break indices; corpus-based text-to-speech synthesis; hierarchical prosodic structure; iterative correction rules; large speech corpus; linguistic properties; multi-pass top-down procedure; phonetic combinations; phonetically rich corpora; prosodic features; prosodically rich corpora; speech output; synthesized speech quality; variable length waveform units; Concatenated codes; Electronic mail; Labeling; Paper technology; Signal synthesis; Speech analysis; Speech processing; Speech synthesis; Web and internet services; Web pages;

fLanguage :

English

Journal_Title :

Speech and Audio Processing, IEEE Transactions on

Publisher :

ieee

ISSN :

1063-6676

Type :

jour

DOI :

10.1109/TSA.2002.803437

Filename :

1045280

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=857225