DocumentCode :
389283
Title :
Research of automatic Chinese word segmentation
Author :
Liu, Kai-Ying ; Zheng, Jia-heng
Author_Institution :
Comput. Sci. Dept., Shanxi Univ., China
Volume :
2
fYear :
2002
fDate :
2002
Firstpage :
805
Abstract :
Automatic Chinese word segmentation is the fundamental task of Chinese information processing. At present ambiguous phrase segmentation and proper name recognition are two obstacles to the performances of Chinese word segmentation systems. We apply a corpus-based method to extract various language phenomena from real texts, and combine a statistical model with rules in Chinese word segmentation, which has increased the precision of segmentation by improving ambiguous phrase segmentation and unknown word recognition, and finally, we describe a Chinese word segmentation system developed by Shanxi University.
Keywords :
character recognition; inference mechanisms; knowledge based systems; learning (artificial intelligence); natural languages; text analysis; Chinese information processing; Shanxi University; ambiguous phrase segmentation; automatic Chinese word segmentation; corpus-based method; language phenomena; proper name recognition; statistical model; unknown word recognition; Character recognition; Computer science; Data mining; Information processing; Information retrieval; Large-scale systems; Modems; Natural languages; Text categorization; Text recognition;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Machine Learning and Cybernetics, 2002. Proceedings. 2002 International Conference on
Print_ISBN :
0-7803-7508-4
Type :
conf
DOI :
10.1109/ICMLC.2002.1174493
Filename :
1174493
Link To Document :
بازگشت