DocumentCode
1938327
Title
A New Machine Learning Method for Chinese Overlapping Disambiguity--Conditional Random Fields
Author
Xiong, Ying ; Zhu, Jie
Author_Institution
Shanghai Jiao Tong Univ., Shanghai
Volume
7
fYear
2007
fDate
19-22 Aug. 2007
Firstpage
3922
Lastpage
3926
Abstract
Conditional random fields (CRFs) are employed in this paper for resolving Chinese overlapping ambiguity in Chinese word segmentation. Instead of the traditional methods which treated the Chinese overlapping ambiguity as classification problem, the proposed approach regards this task as a sequence labeling problem. The best benefit of this method is that it can deal with overlapping ambiguous strings with any lengths no matter the ambiguous strings are pseudo ambiguity or true ambiguity. Several methods are tested on the same training and test corpora. The experimental results show that the CRF models achieve state-of-the-art performance. In comparison with the maximum entropy classifier and the traditional word bigram model, the accuracy has increased 3.98 % and 9.27 % respectively.
Keywords
entropy; learning (artificial intelligence); natural language processing; pattern classification; random processes; Chinese overlapping ambiguity; Chinese overlapping disambiguity; Chinese word segmentation; classification problem; conditional random fields; machine learning method; sequence labeling problem; Cybernetics; Educational institutions; Entropy; Hidden Markov models; Humans; Labeling; Learning systems; Machine learning; Support vector machines; Testing; Chinese word segmentation; Conditional random fields; Maximum Entropy classifier; Overlapping ambiguity; Word bigram model;
fLanguage
English
Publisher
ieee
Conference_Titel
Machine Learning and Cybernetics, 2007 International Conference on
Conference_Location
Hong Kong
Print_ISBN
978-1-4244-0973-0
Electronic_ISBN
978-1-4244-0973-0
Type
conf
DOI
10.1109/ICMLC.2007.4370831
Filename
4370831
Link To Document