DocumentCode
1954287
Title
A Letter Tagging Approach to Uyghur Tokenization
Author
Aisha, Batuer
fYear
2010
fDate
28-30 Dec. 2010
Firstpage
11
Lastpage
14
Abstract
In this paper, we present a letter tagging approach(LTA) to Uyghur tokenization. Experiments show that the problem with label bias (rich and complex suffixes) problem to be resolved using LTA combined with CRFs, so it is more effective than previous work, the accuracy of word tokenization reaches 93.3%. In future our tokenization research will be very useful to other Altaic languages information processing.
Keywords
identification technology; natural language processing; altaic language information processing; label bias; letter tagging; uyghur tokenization; word tokenization; Accuracy; Hidden Markov models; Information processing; Labeling; Natural language processing; Tagging; Training; Letter tagging approach; Morpheme analysis (MA); Tokenization;
fLanguage
English
Publisher
ieee
Conference_Titel
Asian Language Processing (IALP), 2010 International Conference on
Conference_Location
Harbin
Print_ISBN
978-1-4244-9063-9
Type
conf
DOI
10.1109/IALP.2010.72
Filename
5681556
Link To Document