Title :
Domain adaptive Chinese Word Segmentation based on domain knowledge and word-formation feature
Author :
Qin, Xiao ; Wu, Yuqian
Author_Institution :
Inst. of Comput. Sci. & Technol., Peking Univ., Beijing, China
Abstract :
This paper describes a novel method about domain adaptive Chinese Word Segmentation. Unlike traditional methods, our system takes advantage of the domain knowledge and word-formation feature. First, we construct a general knowledge by bootstrapping, which contains domain independent information. Then a cross-domain model is generated with general knowledge and cross-domain knowledge. Furthermore, we realize the importance of word-formation in word segmentation. The segmentation results will be revised with word-formation strategy. There is scarce study about word-formation and this method indeed plays a significant role. We test our system on the corpora given by CIPS-SIGHAN 2010, and our system achieves F score of above 0.94 in all four domains. The good performance proves the effectiveness of our approach.
Keywords :
learning (artificial intelligence); natural language processing; statistical analysis; word processing; bootstrapping; cross-domain knowledge; cross-domain model; domain adaptive Chinese word segmentation; domain knowledge; general knowledge; word-formation feature; AV feature; chinese word segmentation; domain adaption; domain knowledge; word formation;
Conference_Titel :
Natural Language Processing andKnowledge Engineering (NLP-KE), 2011 7th International Conference on
Conference_Location :
Tokushima
Print_ISBN :
978-1-61284-729-0
DOI :
10.1109/NLPKE.2011.6138223