DocumentCode
312303
Title
A new method of generating speech synthesis units based on phonological knowledge and clustering technique
Author
Yoshida, Yuki ; Nakajima, Shigeru ; Hakoda, Kazuo ; Hirokawa, Tomohisa
Author_Institution
NTT Human Interface Labs., Kanagawa, Japan
Volume
3
fYear
1996
fDate
3-6 Oct 1996
Firstpage
1712
Abstract
This paper proposes a new method for generating synthesis units using context dependent phonemes to achieve high quality text-to-speech (TTS) synthesis. If all phoneme triplets (triphones) in Japanese are considered, the number of synthesis units is very large; therefore, we introduce two techniques to reduce the number of synthesis units. The first technique decreases approximately 15,000 triphones to about 6,000 triphones based on phonological knowledge. The second technique is based on a segment quantization, which reduces the number of units even more. Experimental tests show that the proposed method is effective in improving articulation and intelligibility scores, that the number of synthesis units can be decreased without significant loss in TTS quality, and that the preference score is proportional to the number of synthesis units
Keywords
natural language interfaces; speech intelligibility; speech synthesis; Japanese; articulation; clustering technique; context dependent phonemes; experimental tests; phoneme triplets; phonological knowledge; quality; segment quantization; speech intelligibility; speech synthesis unit generation; text-to-speech synthesis; triphones; Ear; Laboratories; Microcomputers; Optimized production technology; Quantization; Speech synthesis; Stability; Tellurium; Testing;
fLanguage
English
Publisher
ieee
Conference_Titel
Spoken Language, 1996. ICSLP 96. Proceedings., Fourth International Conference on
Conference_Location
Philadelphia, PA
Print_ISBN
0-7803-3555-4
Type
conf
DOI
10.1109/ICSLP.1996.607957
Filename
607957
Link To Document