• DocumentCode
    2347842
  • Title

    A morphology-based Chinese word segmentation method

  • Author

    Lin, Xiaojun ; Zhao, Liang ; Zhang, Meng ; Wu, Xihong

  • Author_Institution
    Key Lab. of Machine Perception & Intell., Speech & Hearing Res. Center, Peking Univ., Beijing, China
  • fYear
    2010
  • fDate
    21-23 Aug. 2010
  • Firstpage
    1
  • Lastpage
    5
  • Abstract
    This paper proposes a novel method of Chinese word segmentation utilizing morphology information. The method introduces morphology into statistical model to capture structural relationship within word. It improves the conventional Conditional Random Fields (CRFs) models on the ability of representing the structure information. Firstly, a word-segmented Chinese corpus is annotated with morphology tags by a semi-automatic method. The resulting structure-related tags are integrated into the CRFs model. Secondly, a joint CRFs model is trained, which generates both morphology tags and word boundaries. Experiments are carried out on several SIGHAN Bakeoff corpus and show that the morphology information can improve the performance of Chinese word segmentation significantly, especially for the segmentation of out-of-vocabulary words.
  • Keywords
    computational linguistics; learning (artificial intelligence); natural language processing; statistical analysis; text analysis; SIGHAN bakeoff corpus; conditional random fields; morphology-based Chinese word segmentation method; statistical model; Morphology; Testing; Training; Chinese word segmentation; Morphology; conditional random fields; words out of vocabulary;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Natural Language Processing and Knowledge Engineering (NLP-KE), 2010 International Conference on
  • Conference_Location
    Beijing
  • Print_ISBN
    978-1-4244-6896-6
  • Type

    conf

  • DOI
    10.1109/NLPKE.2010.5587786
  • Filename
    5587786