DocumentCode :
2253486
Title :
A Chinese word segmentation algorithm based on maximum entropy
Author :
Zhang, Li-Yan ; Qin, Min ; Zhang, Xue-Mei ; Ma, Hong-Xia
Author_Institution :
Inst. of Inf., Heibei Univ. of Sci. & Technol., Shijiazhuang, China
Volume :
3
fYear :
2010
fDate :
11-14 July 2010
Firstpage :
1264
Lastpage :
1267
Abstract :
Automatic word segmentation technology is an important component part of modern Chinese information processing. It is the key technology of the Chinese full-text retrieval. This paper presents a Chinese word segmentation algorithm based on maximum entropy. It uses of part-of-speech tagging and word frequency tagging of corpus to establish maximum entropy model based on mutual information as a word segmentation language model to make word segmentation. At last, the binary model was used to test whether the expansion of the training corpus may impact the word segmentation accuracy, and the relationship curves between the expansion of training corpus and the word segmentation accuracy was obtained.
Keywords :
entropy; information retrieval; text analysis; word processing; Chinese full-text retrieval; Chinese information processing; Chinese word segmentation algorithm; binary model; maximum entropy model; part-of-speech tagging; word frequency tagging; word segmentation language model; Accuracy; Computational modeling; Context; Entropy; Mathematical model; Probability; Training; Chinese full text retrieval; Maximum entropy; Word segmentation algorithm;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Machine Learning and Cybernetics (ICMLC), 2010 International Conference on
Conference_Location :
Qingdao
Print_ISBN :
978-1-4244-6526-2
Type :
conf
DOI :
10.1109/ICMLC.2010.5580902
Filename :
5580902
Link To Document :
بازگشت