DocumentCode :
2861093
Title :
The Design of Continuous Speech Corpus Based on Half-Syllable Tibetan
Author :
Yang Yangrui ; Yu Hongzhi ; Li Yonghong
Author_Institution :
Dept. of Comput. Sci. & Inf. Eng., Northwest Univ. for Nat., Lanzhou, China
fYear :
2009
fDate :
11-13 Dec. 2009
Firstpage :
1
Lastpage :
4
Abstract :
The performance of large vocabulary continuous speech recognition system mostly depends on the quality of speech corpus, but the design of corpus ties in corpus selection. According to ancient Tibetan phonology, the continuous speech corpus based on half-syllable Tibetan is built. Each word in the ten thousands sentences of Tibetan texts is separated into initial and final. Then, the inner syllabic initial-final combinations and inter syllabic final-initial combinations are calculated. Finally, based on coverage and sparseness of half-syllabic combinations, the algorithm of corpus extraction is designed to finish the continuous speech corpus with high quality and low redundancy.
Keywords :
speech recognition; vocabulary; Tibetan texts; corpus extraction; half-syllabic combinations; half-syllable Tibetan; inner syllabic initial-final combinations; inter syllabic final-initial combinations; vocabulary continuous speech recognition system; Algorithm design and analysis; Data mining; Design engineering; Environmental economics; Natural languages; Speech recognition; Speech synthesis; Statistics; Vocabulary; Writing;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Computational Intelligence and Software Engineering, 2009. CiSE 2009. International Conference on
Conference_Location :
Wuhan
Print_ISBN :
978-1-4244-4507-3
Electronic_ISBN :
978-1-4244-4507-3
Type :
conf
DOI :
10.1109/CISE.2009.5366048
Filename :
5366048
Link To Document :
بازگشت