• DocumentCode
    1938327
  • Title

    A New Machine Learning Method for Chinese Overlapping Disambiguity--Conditional Random Fields

  • Author

    Xiong, Ying ; Zhu, Jie

  • Author_Institution
    Shanghai Jiao Tong Univ., Shanghai
  • Volume
    7
  • fYear
    2007
  • fDate
    19-22 Aug. 2007
  • Firstpage
    3922
  • Lastpage
    3926
  • Abstract
    Conditional random fields (CRFs) are employed in this paper for resolving Chinese overlapping ambiguity in Chinese word segmentation. Instead of the traditional methods which treated the Chinese overlapping ambiguity as classification problem, the proposed approach regards this task as a sequence labeling problem. The best benefit of this method is that it can deal with overlapping ambiguous strings with any lengths no matter the ambiguous strings are pseudo ambiguity or true ambiguity. Several methods are tested on the same training and test corpora. The experimental results show that the CRF models achieve state-of-the-art performance. In comparison with the maximum entropy classifier and the traditional word bigram model, the accuracy has increased 3.98 % and 9.27 % respectively.
  • Keywords
    entropy; learning (artificial intelligence); natural language processing; pattern classification; random processes; Chinese overlapping ambiguity; Chinese overlapping disambiguity; Chinese word segmentation; classification problem; conditional random fields; machine learning method; sequence labeling problem; Cybernetics; Educational institutions; Entropy; Hidden Markov models; Humans; Labeling; Learning systems; Machine learning; Support vector machines; Testing; Chinese word segmentation; Conditional random fields; Maximum Entropy classifier; Overlapping ambiguity; Word bigram model;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Machine Learning and Cybernetics, 2007 International Conference on
  • Conference_Location
    Hong Kong
  • Print_ISBN
    978-1-4244-0973-0
  • Electronic_ISBN
    978-1-4244-0973-0
  • Type

    conf

  • DOI
    10.1109/ICMLC.2007.4370831
  • Filename
    4370831