Title :
A morpheme-based lexical chunking system for Chinese
Author :
Fu, Guo-hong ; Kit, Chun-yu ; Webster, Jonathan J.
Author_Institution :
Sch. of Comput. Sci. & Technol., Heilongjiang Univ., Harbin
Abstract :
Chinese lexical analysis consists of word segmentation and part-of-speech tagging. Most previous studies consider them as two separate tasks. In this paper we formalize the two processes as a unique chunking task on a sequence of morphemes and present an integrated lexical analysis system for Chinese based on lexicalized hidden Markov models. In this way, both contextual lexical information and word-internal morphological features can be statistically explored and further combined for disambiguation and unknown word resolution. Experimental results show that the proposed system outperforms several baselines, illustrating the benefits of the unified lexical chunking method with morphemes as the basic units.
Keywords :
hidden Markov models; natural language processing; Chinese lexical analysis; hidden Markov model; morpheme-based lexical chunking system; part-of-speech tagging; statistical analysis; word segmentation; word-internal morphological feature; Computer science; Cybernetics; Hidden Markov models; Information analysis; Information retrieval; Machine learning; Morphology; Natural language processing; Natural languages; Tagging; Chinese lexical analysis; Lexical chunking; Part-of-speech tagging; Word segmentation;
Conference_Titel :
Machine Learning and Cybernetics, 2008 International Conference on
Conference_Location :
Kunming
Print_ISBN :
978-1-4244-2095-7
Electronic_ISBN :
978-1-4244-2096-4
DOI :
10.1109/ICMLC.2008.4620820