Improving naturalness in text-to-speech synthesis using natural glottal source

Author

Matsui, Kenji ; Pearson, Stephen D. ; Hata, Kazue ; Kamai, Takahiro

Author_Institution

Matsushita Electr. Ind. Co. Ltd., Osaka, Japan

fYear

1991

fDate

14-17 Apr 1991

Firstpage

769

Abstract

Various methods to improve text-to-speech in its naturalness and its ability to model individual speakers are discussed. Methods using a natural glottal source which is extracted from natural speech by an inverse-filtering technique are described. One method uses a repeating loop. Another method creates a source waveform of the desired pitch by concatenating single pulses. A multisource method which utilizes different types of glottal source by cross-fading techniques is proposed. Perceptual listening tests were performed with synthetic stimuli. The preliminary results show that these methods have the potential to improve the quality of text-to-speech synthesis

Keywords

speech synthesis; concatenating single pulses; cross-fading; inverse-filtering; multisource method; natural glottal source; perceptual listening tests; repeating loop; source waveform; synthetic stimuli; text-to-speech synthesis; Concatenated codes; Design methodology; Frequency; Instruments; Interpolation; Laboratories; Natural languages; Performance evaluation; Speech synthesis; Testing;

fLanguage

English

Publisher

ieee

Conference_Titel

Acoustics, Speech, and Signal Processing, 1991. ICASSP-91., 1991 International Conference on

Conference_Location

Toronto, Ont.

ISSN

1520-6149

Print_ISBN

0-7803-0003-3

Type

conf

DOI

10.1109/ICASSP.1991.150452

Filename

150452