Title :
A segmentation method for crossing ambiguity string based on mutual information and t-test difference
Author :
Lu, Zhiying ; Zhao, Qianqian ; Yang, Le
Author_Institution :
Sch. of Electr. Eng. & Autom., Tianjin Univ., Tianjin, China
Abstract :
One nodus existing in Chinese word segmentation is the ambiguity problem of which more than 85% are crossing ambiguity, therefore it is significant to decrease the error in dealing with the crossing ambiguity. Taking the advantage of the characteristics of the crossing ambiguity string, a novel method based on the mutual information and t-test difference is proposed to deal with the ambiguities in Chinese word segmentation. The mutual information which denotes the close degree of the two Chinese characters is calculated firstly. Then the t-test difference which reflected the context information of the two Chinese characters is calculated. And finally the segmentation position is obtained by using the mutual information and t-test difference. The experiment results demonstrate that the accuracy of segmentation can be effectively improved.
Keywords :
entropy; natural language processing; statistical analysis; Chinese word segmentation; crossing ambiguity string; mutual information; t-test difference; Automation; Dictionaries; Entropy; Information processing; Mutual information; Probability; Random variables; Stability; Statistics; Support vector machines; Chinese word segmentation; context; crossing ambiguity; mutual information; t-test difference;
Conference_Titel :
Information, Computing and Telecommunication, 2009. YC-ICT '09. IEEE Youth Conference on
Conference_Location :
Beijing
Print_ISBN :
978-1-4244-5074-9
Electronic_ISBN :
978-1-4244-5076-3
DOI :
10.1109/YCICT.2009.5382345