DocumentCode :
454678
Title :
Constructing a Phonetic-Rich Speech Corpus While Controlling Time-Dependent Voice Quality Variability for English Speech Synthesis
Author :
Ni, Jinfu ; Hirai, Toshio ; Kawai, Hisashi
Author_Institution :
ATR Spoken Language Commun. Res. Lab.
Volume :
1
fYear :
2006
fDate :
14-19 May 2006
Abstract :
This paper presents a practical approach to constructing a large-scale speech corpus for corpus-based speech synthesis. This consists of (1) selecting a source text corpus that fits limited target domains; (2) analyzing the source text corpus to obtain the unit statistics; (3) automatically extracting prompt subjects (sentences) from the source text corpus to maximize the intended unit coverage with the given amount of text; and (4) recording prompt subjects while controlling such critical factors that cause undesirable voice variability. This paper describes related computational methods, such as a greedy algorithm for prompt selection, the proximity effects found in a real recording system, and a technique for detecting the time-dependent voice variations. While the approach is demonstrated in English, it is also promising for other languages
Keywords :
greedy algorithms; natural languages; speech synthesis; statistics; English speech synthesis; greedy algorithm; phonetic-rich speech corpus; prompt selection; real recording system; time-dependent voice quality variability; unit statistics; Automatic control; Communication system control; Degradation; Laboratories; Large-scale systems; Natural languages; Research and development; Speech analysis; Speech synthesis; Statistical analysis;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Acoustics, Speech and Signal Processing, 2006. ICASSP 2006 Proceedings. 2006 IEEE International Conference on
Conference_Location :
Toulouse
ISSN :
1520-6149
Print_ISBN :
1-4244-0469-X
Type :
conf
DOI :
10.1109/ICASSP.2006.1660162
Filename :
1660162
Link To Document :
بازگشت