Title :
Automatic Construction for a TTS Corpus with Limited Text
Author :
Zhang Wei ; Liu Yayu ; Deng Ye ; Pang Minhui
Author_Institution :
Dept. of Comput. Sci. & Technol., Ocean Univ. of China, Qin Dao, China
Abstract :
This paper presents a method for a automatically constructed text corpus with limited text for speech synthesis system. It is to collect phonetically rich sentences with high coverage of phonetic contextual units but has a small text size. In this paper, we present a new greedy algorithm to select text from the mother text. The mother text is auto-loaded by the web crawler and it is dealt with speech-music discrimination and sentence segmentation, the remainder is used for the mother text, so our text is limited and it is different from the traditional construction of speech corpus. The mother text assembled (about 4612 sentences). Diphone is used as the basic unit. We used the modified Okapi formula to evaluate the score of sentences. The experimental results show that this method successfully achieves the best coverage of diphone is 93.52%. It can generate a good speech corpus.
Keywords :
greedy algorithms; natural language processing; speech processing; speech synthesis; Diphone unit; Okapi formula; TTS corpus; greedy algorithm; mother text; phonetic contextual units; sentence segmentation; speech synthesis system; speech-music discrimination; text-to-speech technology; web crawler; Concrete; Context modeling; Databases; Greedy algorithms; Large-scale systems; Marine technology; Natural languages; Paper technology; Space technology; Speech synthesis; Okapi; speech corpus; speech synthesis; text selection;
Conference_Titel :
Measuring Technology and Mechatronics Automation (ICMTMA), 2010 International Conference on
Conference_Location :
Changsha City
Print_ISBN :
978-1-4244-5001-5
Electronic_ISBN :
978-1-4244-5739-7
DOI :
10.1109/ICMTMA.2010.796