DocumentCode :
3466939
Title :
Khmer POS Tagger: A Transformation-based Approach with Hybrid Unknown Word Handling
Author :
Nou, Chenda ; Kameyama, Wataru
Author_Institution :
WASEDA Univ., Honjo
fYear :
2007
fDate :
17-19 Sept. 2007
Firstpage :
482
Lastpage :
492
Abstract :
This paper presents an initiative research on Khmer part-of-speech tagger. We propose some modifications on applying rule algorithm of the transformation-based approach to adapt to Khmer language which is morphologically and syntactically different from the English language. Furthermore, to overcome the limited coverage of the rule-based approach in handling unknown words, we propose a hybrid approach to combine the rule-based and trigram models. Although training on a very small corpus, both proposed approaches achieve higher accuracy than the conventional methods. The tagger achieves 95.27% on training data and 91.96% on test data which includes 9% of unknown words.
Keywords :
grammars; natural language processing; Khmer language; Khmer part-of-speech tagger; rule-based model; transformation-based approach; trigram model; Data mining; Error analysis; Humans; Natural language processing; Natural languages; Stochastic processes; Tagging; Telecommunication computing; Testing; Training data;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Semantic Computing, 2007. ICSC 2007. International Conference on
Conference_Location :
Irvine, CA
Print_ISBN :
978-0-7695-2997-4
Type :
conf
DOI :
10.1109/ICSC.2007.104
Filename :
4338385
Link To Document :
بازگشت