DocumentCode :
2347543
Title :
Part-of-speech tagging for Chinese unknown words in a domain-specific small corpus using morphological and contextual rules
Author :
Chang, Tao-Hsing ; Hsu, Fu-Yuan ; Lee, Chia-Hoang ; Lee, Hahn-Ming
Author_Institution :
Dept. of Comput. Sci. & Inf. Eng., Nat. Kaohsiung Univ. of Appl. Sci., Kaohsiung, Taiwan
fYear :
2010
fDate :
21-23 Aug. 2010
Firstpage :
1
Lastpage :
6
Abstract :
Many studies have tried to search useful information on the Internet by meaningful terms or words. The performance of these approaches is often affected by the accuracy of unknown word extraction and POS tagging, while the accuracy is affected by the size of training corpora and the characteristics of language. This work proposes and develops a method that concentrates on tagging the POS of Chinese unknown words for the domain of our interest, based on the integration of morphological, contextual rules and a statistics-based method. Experimental results indicate that the proposed method can overcome the difficulties resulting from small corpora in oriental languages, and can accurately tags unknown words with POS in domain-specific small corpora.
Keywords :
natural language processing; statistical analysis; text analysis; Chinese Unknown Words; Internet; contextual rules; morphological rule; part-of-speech tagging; statistics-based method; Accuracy; Classification algorithms; Computational linguistics; Computer science; Hidden Markov models; Tagging; Training; Chinese unknown word; Contextual rule; Domain-specific corpus; Morphological rule; Part-of-speech tagging;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Natural Language Processing and Knowledge Engineering (NLP-KE), 2010 International Conference on
Conference_Location :
Beijing
Print_ISBN :
978-1-4244-6896-6
Type :
conf
DOI :
10.1109/NLPKE.2010.5587771
Filename :
5587771
Link To Document :
بازگشت