• DocumentCode
    514557
  • Title

    Automatic Construction for a TTS Corpus with Limited Text

  • Author

    Zhang Wei ; Liu Yayu ; Deng Ye ; Pang Minhui

  • Author_Institution
    Dept. of Comput. Sci. & Technol., Ocean Univ. of China, Qin Dao, China
  • Volume
    1
  • fYear
    2010
  • fDate
    13-14 March 2010
  • Firstpage
    707
  • Lastpage
    710
  • Abstract
    This paper presents a method for a automatically constructed text corpus with limited text for speech synthesis system. It is to collect phonetically rich sentences with high coverage of phonetic contextual units but has a small text size. In this paper, we present a new greedy algorithm to select text from the mother text. The mother text is auto-loaded by the web crawler and it is dealt with speech-music discrimination and sentence segmentation, the remainder is used for the mother text, so our text is limited and it is different from the traditional construction of speech corpus. The mother text assembled (about 4612 sentences). Diphone is used as the basic unit. We used the modified Okapi formula to evaluate the score of sentences. The experimental results show that this method successfully achieves the best coverage of diphone is 93.52%. It can generate a good speech corpus.
  • Keywords
    greedy algorithms; natural language processing; speech processing; speech synthesis; Diphone unit; Okapi formula; TTS corpus; greedy algorithm; mother text; phonetic contextual units; sentence segmentation; speech synthesis system; speech-music discrimination; text-to-speech technology; web crawler; Concrete; Context modeling; Databases; Greedy algorithms; Large-scale systems; Marine technology; Natural languages; Paper technology; Space technology; Speech synthesis; Okapi; speech corpus; speech synthesis; text selection;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Measuring Technology and Mechatronics Automation (ICMTMA), 2010 International Conference on
  • Conference_Location
    Changsha City
  • Print_ISBN
    978-1-4244-5001-5
  • Electronic_ISBN
    978-1-4244-5739-7
  • Type

    conf

  • DOI
    10.1109/ICMTMA.2010.796
  • Filename
    5458487