DocumentCode
3133579
Title
A segmentation method for crossing ambiguity string based on mutual information and t-test difference
Author
Lu, Zhiying ; Zhao, Qianqian ; Yang, Le
Author_Institution
Sch. of Electr. Eng. & Autom., Tianjin Univ., Tianjin, China
fYear
2009
fDate
20-21 Sept. 2009
Firstpage
371
Lastpage
374
Abstract
One nodus existing in Chinese word segmentation is the ambiguity problem of which more than 85% are crossing ambiguity, therefore it is significant to decrease the error in dealing with the crossing ambiguity. Taking the advantage of the characteristics of the crossing ambiguity string, a novel method based on the mutual information and t-test difference is proposed to deal with the ambiguities in Chinese word segmentation. The mutual information which denotes the close degree of the two Chinese characters is calculated firstly. Then the t-test difference which reflected the context information of the two Chinese characters is calculated. And finally the segmentation position is obtained by using the mutual information and t-test difference. The experiment results demonstrate that the accuracy of segmentation can be effectively improved.
Keywords
entropy; natural language processing; statistical analysis; Chinese word segmentation; crossing ambiguity string; mutual information; t-test difference; Automation; Dictionaries; Entropy; Information processing; Mutual information; Probability; Random variables; Stability; Statistics; Support vector machines; Chinese word segmentation; context; crossing ambiguity; mutual information; t-test difference;
fLanguage
English
Publisher
ieee
Conference_Titel
Information, Computing and Telecommunication, 2009. YC-ICT '09. IEEE Youth Conference on
Conference_Location
Beijing
Print_ISBN
978-1-4244-5074-9
Electronic_ISBN
978-1-4244-5076-3
Type
conf
DOI
10.1109/YCICT.2009.5382345
Filename
5382345
Link To Document