• DocumentCode
    3133579
  • Title

    A segmentation method for crossing ambiguity string based on mutual information and t-test difference

  • Author

    Lu, Zhiying ; Zhao, Qianqian ; Yang, Le

  • Author_Institution
    Sch. of Electr. Eng. & Autom., Tianjin Univ., Tianjin, China
  • fYear
    2009
  • fDate
    20-21 Sept. 2009
  • Firstpage
    371
  • Lastpage
    374
  • Abstract
    One nodus existing in Chinese word segmentation is the ambiguity problem of which more than 85% are crossing ambiguity, therefore it is significant to decrease the error in dealing with the crossing ambiguity. Taking the advantage of the characteristics of the crossing ambiguity string, a novel method based on the mutual information and t-test difference is proposed to deal with the ambiguities in Chinese word segmentation. The mutual information which denotes the close degree of the two Chinese characters is calculated firstly. Then the t-test difference which reflected the context information of the two Chinese characters is calculated. And finally the segmentation position is obtained by using the mutual information and t-test difference. The experiment results demonstrate that the accuracy of segmentation can be effectively improved.
  • Keywords
    entropy; natural language processing; statistical analysis; Chinese word segmentation; crossing ambiguity string; mutual information; t-test difference; Automation; Dictionaries; Entropy; Information processing; Mutual information; Probability; Random variables; Stability; Statistics; Support vector machines; Chinese word segmentation; context; crossing ambiguity; mutual information; t-test difference;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Information, Computing and Telecommunication, 2009. YC-ICT '09. IEEE Youth Conference on
  • Conference_Location
    Beijing
  • Print_ISBN
    978-1-4244-5074-9
  • Electronic_ISBN
    978-1-4244-5076-3
  • Type

    conf

  • DOI
    10.1109/YCICT.2009.5382345
  • Filename
    5382345