Title :
Using context and semantic resources for cross-domain word Segmentation
Author :
Tong, Deqin ; Huang, Degen ; Zhang, Jing
Author_Institution :
Dalian Univ. of Technol., Dalian, China
Abstract :
Chinese word Segmentation (CWS) plays a fundamental role in Chinese language processing, because almost all Chinese language processing tasks are assumed to work with segmented input. After active research for many years, most of reports from evaluation tasks always give impressive results. But most of them are limited to testing corpora on specific area. Once used on another different domain, the accuracy will plummet. Thus, the domain-adaptive word segmentation is introduced into Bakeoffs. In this paper, we propose a new joint decoding strategy that combines the character-based and word-based conditional random field model, which takes the part-of-speech of words in dictionary as important features in a segment path. Moreover, according to the characteristics of the cross-domain segmentation, context information is reasonably used to guide CWS. Besides, because there are similar contexts among synonyms, semantic information can be used to recall some out-of-vocabularies (OOVs). This method is proven to be effective through several experiments on the simplified Chinese test data from SIGHAN Bakeoff 2010. Except for the domain of literature, the F-scores are higher than the best performance of the corresponding open test. In addition, the rate of OOV recall reaches 70.7%, 84.3%, 79.0% and 86.2%, respectively.
Keywords :
natural language processing; word processing; Chinese language processing; Chinese word segmentation; Using; character-based conditional random field model; context resource; cross-domain word segmentation; decoding strategy; domain-adaptive word segmentation; out-of-vocabularies; part-of-speech; semantic resource; word-based conditional random field model; Artificial intelligence; Computers; Context; Finance; Labeling; CRFs; context variables; cross-domain CWS; joint decoding; semantic resources;
Conference_Titel :
Natural Language Processing andKnowledge Engineering (NLP-KE), 2011 7th International Conference on
Conference_Location :
Tokushima
Print_ISBN :
978-1-61284-729-0
DOI :
10.1109/NLPKE.2011.6138199