Title :
Design of the Tibetan Continuous Speech Corpus Based on Triphone
Author :
Li, Yong ; Yang, Yangrui
Author_Institution :
Xinjiang Normal Univ., Urumqi, China
fDate :
Nov. 30 2009-Dec. 1 2009
Abstract :
Large vocabulary continuous speech recognition system performance largely depends on the quality of speech corpus and how to select corpus is the key of corpus design. By taking Tibetan Amdo dialect in XiaHe as the research object, this paper builds continuous speech corpus based on triphone. At first, we collect text corpus with 100 thousand sentences and transform them to IPA according to pronunciation in XiaHe dialect, and then summarize the structure of triphone juncture, analyze the combination type and frequency of triphone in corpus statistically with text-processing platform in detail. At last by comprehensively considering coverage rate and sparseness of triphone and class-triphone we design the algorithm for extraction of corpus and realize automatic selection to corpus.
Keywords :
mobile handsets; natural language processing; speech recognition; text analysis; vocabulary; Tibetan continuous speech corpus; corpus design; large vocabulary continuous speech recognition system; speech corpus quality; text-processing platform; triphone; triphone juncture structure; Adhesives; Algorithm design and analysis; Flexible printed circuits; Frequency; Globalization; Government; Knowledge acquisition; Natural languages; Speech recognition; Speech synthesis; Tibetan; class-triphone; phone; speech corpus; triphone;
Conference_Titel :
Knowledge Acquisition and Modeling, 2009. KAM '09. Second International Symposium on
Conference_Location :
Wuhan
Print_ISBN :
978-0-7695-3888-4
DOI :
10.1109/KAM.2009.118