DocumentCode
2347842
Title
A morphology-based Chinese word segmentation method
Author
Lin, Xiaojun ; Zhao, Liang ; Zhang, Meng ; Wu, Xihong
Author_Institution
Key Lab. of Machine Perception & Intell., Speech & Hearing Res. Center, Peking Univ., Beijing, China
fYear
2010
fDate
21-23 Aug. 2010
Firstpage
1
Lastpage
5
Abstract
This paper proposes a novel method of Chinese word segmentation utilizing morphology information. The method introduces morphology into statistical model to capture structural relationship within word. It improves the conventional Conditional Random Fields (CRFs) models on the ability of representing the structure information. Firstly, a word-segmented Chinese corpus is annotated with morphology tags by a semi-automatic method. The resulting structure-related tags are integrated into the CRFs model. Secondly, a joint CRFs model is trained, which generates both morphology tags and word boundaries. Experiments are carried out on several SIGHAN Bakeoff corpus and show that the morphology information can improve the performance of Chinese word segmentation significantly, especially for the segmentation of out-of-vocabulary words.
Keywords
computational linguistics; learning (artificial intelligence); natural language processing; statistical analysis; text analysis; SIGHAN bakeoff corpus; conditional random fields; morphology-based Chinese word segmentation method; statistical model; Morphology; Testing; Training; Chinese word segmentation; Morphology; conditional random fields; words out of vocabulary;
fLanguage
English
Publisher
ieee
Conference_Titel
Natural Language Processing and Knowledge Engineering (NLP-KE), 2010 International Conference on
Conference_Location
Beijing
Print_ISBN
978-1-4244-6896-6
Type
conf
DOI
10.1109/NLPKE.2010.5587786
Filename
5587786
Link To Document