DocumentCode :
3141469
Title :
A comparison study of candidate generation for Chinese word segmentation
Author :
Zhang, Kaixu ; Sun, Maosong
Author_Institution :
State Key Lab. of Intell. Technol. & Syst., Tsinghua Univ., Beijing, China
fYear :
2011
fDate :
27-29 Nov. 2011
Firstpage :
60
Lastpage :
67
Abstract :
Chinese word segmentation can be implemented in a coarse-to-fine schema. In such schema, a candidate set containing multiple segmentations of a sentence (rather than only one segmentation) is used as the output of a coarse-grained CWS model. Then a more sophisticated CWS model or other models of downstream tasks will reconsider all the segmentations in the candidate set to determine the best segmentation. This paper discussed and compared three candidate generation methods, namely boundary level method, word level method and sentence level method, in a unified form. The oracle F1-measures of the candidate sets of these methods were compared. The performances were also compared in a joint CWS and POS-tagging task. The results showed that the word level method has the best performance among these three candidate generation methods. Results also showed that the coarse-to-fine schema outperforms the pipeline schema in which only one segmentation is used for the downstream task and the joint schema in which all possible segmentation is used for the downstream task. Moreover, the speed of the coarse-to-fine schema is closed to the speed of the pipeline schema and much higher than the speed of the joint schema.
Keywords :
natural language processing; word processing; Chinese word segmentation; boundary level method; candidate generation method; coarse-grained CWS model; coarse-to-fine segmentation; sentence level method; sentence segmentation; word level method; Decoding; Sun; Chinese word segmentation; natural langauge processign; part-of-speech tagging;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Natural Language Processing andKnowledge Engineering (NLP-KE), 2011 7th International Conference on
Conference_Location :
Tokushima
Print_ISBN :
978-1-61284-729-0
Type :
conf
DOI :
10.1109/NLPKE.2011.6138170
Filename :
6138170
Link To Document :
بازگشت