Title :
A conditional random fields model for overlapping ambiguity resolution in Chinese word segmentation
Author :
Liang, Yan ; Zhu, Yaoting
Abstract :
Overlapping ambiguity is a kind of ambiguity phenomena in the Chinese word segmentation. Up to now, the researches on overlapping ambiguity always focused on the 3-character overlapping ambiguity strings. In this paper the distribution and forms of overlapping ambiguity strings are discussed empirically. In order to deal with the overlapping ambiguity strings in different forms synchronously, a conditional random fields model is used. Different features for overlapping ambiguity resolution are explored, including component independency probability, component co-occurrence probability, in-word probability of a component and string structures. The experimental results show that the precision reaches 93.81% in the open test.
Keywords :
natural languages; text analysis; Chinese word segmentation; component co-occurrence probability; component independency probability; conditional random fields model; in-word probability; overlapping ambiguity resolution; Bayesian methods; Educational institutions; Labeling; Natural language processing; Testing;
Conference_Titel :
Granular Computing, 2009, GRC '09. IEEE International Conference on
Conference_Location :
Nanchang
Print_ISBN :
978-1-4244-4830-2
DOI :
10.1109/GRC.2009.5255092