DocumentCode :
578162
Title :
Integrating N-gram model information for Chinese word segmentation based on conditional random fields
Author :
Ying Xiong
Author_Institution :
Dept. Sch. of Electron. & Inf. Eng., Tongji Univ., Shanghai, China
Volume :
2
fYear :
2012
fDate :
15-17 July 2012
Firstpage :
762
Lastpage :
766
Abstract :
This paper presents a Chinese word segmentation system based on conditional random fields, which integrates the result information of N-gram model as features of conditional random fields. Since dictionary-based N-gram model can deal with in-vocabulary words very well, while conditional random fields have the advantage of recognizing out-of-vocabulary words. This approach is evaluated using the PKU data from Sighan Bakeoff 2005. The experimental results have proven that this method achieved an F-measure of 95.0% and higher Roov (85.2%) and Riv (97.9%).
Keywords :
computational linguistics; natural language processing; random processes; vocabulary; Chinese word segmentation; F-measure; N-gram model information; PKU data; conditional random field; dictionary-based N-gram model; in-vocabulary word; out-of-vocabulary word; Abstracts; Accuracy; Biological system modeling; Bismuth; Computational modeling; Chinese word segmentation; Conditional random fields; N-gram model;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Machine Learning and Cybernetics (ICMLC), 2012 International Conference on
Conference_Location :
Xian
ISSN :
2160-133X
Print_ISBN :
978-1-4673-1484-8
Type :
conf
DOI :
10.1109/ICMLC.2012.6359021
Filename :
6359021
Link To Document :
بازگشت