مرکز منطقه ای اطلاع رساني علوم و فناوري - A Chinese word segmentation algorithm based on maximum entropy

DocumentCode :

2253486

Title :

A Chinese word segmentation algorithm based on maximum entropy

Author :

Zhang, Li-Yan ; Qin, Min ; Zhang, Xue-Mei ; Ma, Hong-Xia

Author_Institution :

Inst. of Inf., Heibei Univ. of Sci. & Technol., Shijiazhuang, China

Volume :

fYear :

2010

fDate :

11-14 July 2010

Firstpage :

1264

Lastpage :

1267

Abstract :

Automatic word segmentation technology is an important component part of modern Chinese information processing. It is the key technology of the Chinese full-text retrieval. This paper presents a Chinese word segmentation algorithm based on maximum entropy. It uses of part-of-speech tagging and word frequency tagging of corpus to establish maximum entropy model based on mutual information as a word segmentation language model to make word segmentation. At last, the binary model was used to test whether the expansion of the training corpus may impact the word segmentation accuracy, and the relationship curves between the expansion of training corpus and the word segmentation accuracy was obtained.

Keywords :

entropy; information retrieval; text analysis; word processing; Chinese full-text retrieval; Chinese information processing; Chinese word segmentation algorithm; binary model; maximum entropy model; part-of-speech tagging; word frequency tagging; word segmentation language model; Accuracy; Computational modeling; Context; Entropy; Mathematical model; Probability; Training; Chinese full text retrieval; Maximum entropy; Word segmentation algorithm;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Machine Learning and Cybernetics (ICMLC), 2010 International Conference on

Conference_Location :

Qingdao

Print_ISBN :

978-1-4244-6526-2

Type :

conf

DOI :

10.1109/ICMLC.2010.5580902

Filename :

5580902

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=2253486