Title :
Context-Based Approach for Covering Ambiguity Resolution in Chinese Word Segmentation
Author :
Feng, Su-Qin ; Hou, Su-qin
Author_Institution :
Dept. of Comput. Sci. & Technol., Teachers Univ. ShanXi, Xinzhou, China
Abstract :
Covering ambiguity is a vital issue in Chinese word segmentation. Challenges are that disambiguation is depending on the contextual information. This paper collected contextual information statistics of covering ambiguity words and found a context calculation mode by using log likelihood ratio. A weighing calculation formula is designed for considering contextual informationpsilas window size and location and the influence of frequency on covering ambiguity. Based on this, two methods are used for disambiguation. One is using the maximum log likelihood ratio in contextual information; the other is using the maximum numerical value of the sum of respective log likelihood ratio under the situation of combination or separation in contextual information. 14 frequently appeared covering ambiguous words are used as examples. The average accuracy of the former method reaches 84.93%, and that of the latter reaches 95.60 %. The result of the experiment reveals that using the combination of contextual information is effective for disambiguation.
Keywords :
computational linguistics; natural language processing; Chinese word segmentation; context-based approach; contextual information; covering ambiguity resolution; log likelihood ratio; weighing calculation formula; Algorithm design and analysis; Arithmetic; Chemical engineering; Computer science; Frequency; Large-scale systems; Natural language processing; Natural languages; Statistics; Testing; Chinese word segmentation; covering ambiguity; log likelihood ratio; natural language processing;
Conference_Titel :
Information and Computing Science, 2009. ICIC '09. Second International Conference on
Conference_Location :
Manchester
Print_ISBN :
978-0-7695-3634-7
DOI :
10.1109/ICIC.2009.119