Title :
Part-of-speech tagging for Chinese unknown words in a domain-specific small corpus using morphological and contextual rules
Author :
Chang, Tao-Hsing ; Hsu, Fu-Yuan ; Lee, Chia-Hoang ; Lee, Hahn-Ming
Author_Institution :
Dept. of Comput. Sci. & Inf. Eng., Nat. Kaohsiung Univ. of Appl. Sci., Kaohsiung, Taiwan
Abstract :
Many studies have tried to search useful information on the Internet by meaningful terms or words. The performance of these approaches is often affected by the accuracy of unknown word extraction and POS tagging, while the accuracy is affected by the size of training corpora and the characteristics of language. This work proposes and develops a method that concentrates on tagging the POS of Chinese unknown words for the domain of our interest, based on the integration of morphological, contextual rules and a statistics-based method. Experimental results indicate that the proposed method can overcome the difficulties resulting from small corpora in oriental languages, and can accurately tags unknown words with POS in domain-specific small corpora.
Keywords :
natural language processing; statistical analysis; text analysis; Chinese Unknown Words; Internet; contextual rules; morphological rule; part-of-speech tagging; statistics-based method; Accuracy; Classification algorithms; Computational linguistics; Computer science; Hidden Markov models; Tagging; Training; Chinese unknown word; Contextual rule; Domain-specific corpus; Morphological rule; Part-of-speech tagging;
Conference_Titel :
Natural Language Processing and Knowledge Engineering (NLP-KE), 2010 International Conference on
Conference_Location :
Beijing
Print_ISBN :
978-1-4244-6896-6
DOI :
10.1109/NLPKE.2010.5587771