DocumentCode
495048
Title
Context-Based Approach for Covering Ambiguity Resolution in Chinese Word Segmentation
Author
Feng, Su-Qin ; Hou, Su-qin
Author_Institution
Dept. of Comput. Sci. & Technol., Teachers Univ. ShanXi, Xinzhou, China
Volume
2
fYear
2009
fDate
21-22 May 2009
Firstpage
43
Lastpage
46
Abstract
Covering ambiguity is a vital issue in Chinese word segmentation. Challenges are that disambiguation is depending on the contextual information. This paper collected contextual information statistics of covering ambiguity words and found a context calculation mode by using log likelihood ratio. A weighing calculation formula is designed for considering contextual informationpsilas window size and location and the influence of frequency on covering ambiguity. Based on this, two methods are used for disambiguation. One is using the maximum log likelihood ratio in contextual information; the other is using the maximum numerical value of the sum of respective log likelihood ratio under the situation of combination or separation in contextual information. 14 frequently appeared covering ambiguous words are used as examples. The average accuracy of the former method reaches 84.93%, and that of the latter reaches 95.60 %. The result of the experiment reveals that using the combination of contextual information is effective for disambiguation.
Keywords
computational linguistics; natural language processing; Chinese word segmentation; context-based approach; contextual information; covering ambiguity resolution; log likelihood ratio; weighing calculation formula; Algorithm design and analysis; Arithmetic; Chemical engineering; Computer science; Frequency; Large-scale systems; Natural language processing; Natural languages; Statistics; Testing; Chinese word segmentation; covering ambiguity; log likelihood ratio; natural language processing;
fLanguage
English
Publisher
ieee
Conference_Titel
Information and Computing Science, 2009. ICIC '09. Second International Conference on
Conference_Location
Manchester
Print_ISBN
978-0-7695-3634-7
Type
conf
DOI
10.1109/ICIC.2009.119
Filename
5169003
Link To Document