DocumentCode :
3133579
Title :
A segmentation method for crossing ambiguity string based on mutual information and t-test difference
Author :
Lu, Zhiying ; Zhao, Qianqian ; Yang, Le
Author_Institution :
Sch. of Electr. Eng. & Autom., Tianjin Univ., Tianjin, China
fYear :
2009
fDate :
20-21 Sept. 2009
Firstpage :
371
Lastpage :
374
Abstract :
One nodus existing in Chinese word segmentation is the ambiguity problem of which more than 85% are crossing ambiguity, therefore it is significant to decrease the error in dealing with the crossing ambiguity. Taking the advantage of the characteristics of the crossing ambiguity string, a novel method based on the mutual information and t-test difference is proposed to deal with the ambiguities in Chinese word segmentation. The mutual information which denotes the close degree of the two Chinese characters is calculated firstly. Then the t-test difference which reflected the context information of the two Chinese characters is calculated. And finally the segmentation position is obtained by using the mutual information and t-test difference. The experiment results demonstrate that the accuracy of segmentation can be effectively improved.
Keywords :
entropy; natural language processing; statistical analysis; Chinese word segmentation; crossing ambiguity string; mutual information; t-test difference; Automation; Dictionaries; Entropy; Information processing; Mutual information; Probability; Random variables; Stability; Statistics; Support vector machines; Chinese word segmentation; context; crossing ambiguity; mutual information; t-test difference;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Information, Computing and Telecommunication, 2009. YC-ICT '09. IEEE Youth Conference on
Conference_Location :
Beijing
Print_ISBN :
978-1-4244-5074-9
Electronic_ISBN :
978-1-4244-5076-3
Type :
conf
DOI :
10.1109/YCICT.2009.5382345
Filename :
5382345
Link To Document :
بازگشت