DocumentCode :
3502486
Title :
A Chinese Word Segmentation Based on Machine Learning
Author :
Hongsheng, Wang ; Mingming, Cui
Author_Institution :
Coll. of Inf. Sci. & Eng., Shenyang Univ. of Technol., Shenyang
Volume :
2
fYear :
2009
fDate :
7-8 March 2009
Firstpage :
610
Lastpage :
613
Abstract :
Different from English, there are no interval marks between words in Chinese. Segmenting Chinese text to words is the first job for every kind of Chinese information processing, so Chinese word segmentation is a basal and difficult issue in the field of Chinese information processing. Traditional word segmentation systems have to establish the dictionary and add unknown words out of the dictionary with manual work. This paper proposes a new Chinese word segmentation model which can automatically establish a dictionary, gradually update it and perfect it based on machine learning. Four modules of the machine learning model for Chinese word segmentation system are introduced in detail and some improvements of the algorithms are made on some module to improve system performance. After the test of closed corpus and open corpus, the results show that the method alleviates the workload of building and maintaining the dictionary, furthermore, it resolves the issues of ambiguity processing and unknown words recognition.
Keywords :
learning (artificial intelligence); natural language processing; word processing; Chinese information processing; Chinese word segmentation; English; dictionary; machine learning; Dictionaries; Educational institutions; Educational technology; Information processing; Information science; Machine learning; Machine learning algorithms; Natural languages; Probability; Speech recognition; Chinese word segmentation; ambiguity processing; artificial dictionary; machine learning; unknown words recognition;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Education Technology and Computer Science, 2009. ETCS '09. First International Workshop on
Conference_Location :
Wuhan, Hubei
Print_ISBN :
978-1-4244-3581-4
Type :
conf
DOI :
10.1109/ETCS.2009.397
Filename :
4959112
Link To Document :
بازگشت