DocumentCode :
3423388
Title :
A conditional random fields model for overlapping ambiguity resolution in Chinese word segmentation
Author :
Liang, Yan ; Zhu, Yaoting
fYear :
2009
fDate :
17-19 Aug. 2009
Firstpage :
384
Lastpage :
389
Abstract :
Overlapping ambiguity is a kind of ambiguity phenomena in the Chinese word segmentation. Up to now, the researches on overlapping ambiguity always focused on the 3-character overlapping ambiguity strings. In this paper the distribution and forms of overlapping ambiguity strings are discussed empirically. In order to deal with the overlapping ambiguity strings in different forms synchronously, a conditional random fields model is used. Different features for overlapping ambiguity resolution are explored, including component independency probability, component co-occurrence probability, in-word probability of a component and string structures. The experimental results show that the precision reaches 93.81% in the open test.
Keywords :
natural languages; text analysis; Chinese word segmentation; component co-occurrence probability; component independency probability; conditional random fields model; in-word probability; overlapping ambiguity resolution; Bayesian methods; Educational institutions; Labeling; Natural language processing; Testing;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Granular Computing, 2009, GRC '09. IEEE International Conference on
Conference_Location :
Nanchang
Print_ISBN :
978-1-4244-4830-2
Type :
conf
DOI :
10.1109/GRC.2009.5255092
Filename :
5255092
Link To Document :
بازگشت