A method to choose an appropriate concatenating position for automatically generated synthesis unites

Author

Hun Jae Park ; Sang Hun Kim ; Min Soo Han ; Jae Ho Chung

Author_Institution

Bumil Inf. & Commun. Co., Ltd., Seoul, South Korea

fYear

1998

fDate

8-11 Sept. 1998

Firstpage

1

Lastpage

3

Abstract

To make synthesized speech sounds more natural, one would prefer using a larger size of speech data base. However, when we adopt manual segmentation method in making a large size data base, the procedure would be very time-consuming and the constructed data base would be in consistent. Consequently, automatic segmentation using speech recognition system has been used for making a large size of speech synthesis data base. However, when the automatic segmentation were applied to make synthesis speech, significant concatenation distortion happens due to phoneme boundary error. The purpose of our study is to choose an appropriate concatenating position of automatically generated synthesis units which may have errors on their boundaries. We have performed MOS(Mean Opinion Score) tests and analyzed the shape of spectrograms to evaluate our proposed algorithm. Our test results have shown that the quality of concatenated speeches with proposed algorithm is superior to that of concatenated ones generated by automatic segmentation only.

Keywords

signal processing; speech recognition; speech synthesis; MOS tests; automatic segmentation; automatically generated synthesis unit; concatenating position; manual segmentation method; mean opinion score tests; phoneme boundary error; speech recognition system; speech synthesis data base; synthesized speech sounds; Cepstrum; Heuristic algorithms; Manuals; Spectrogram; Speech; Speech recognition; Telecommunications;

fLanguage

English

Publisher

ieee

Conference_Titel

Signal Processing Conference (EUSIPCO 1998), 9th European

Conference_Location

Rhodes

Print_ISBN

978-960-7620-06-4

Type

conf

Filename

7089989