• DocumentCode
    2735814
  • Title

    Automatic Identification of Chinese Multiword Chunk Based on CRF

  • Author

    Li, Ru ; Zhong, Lijun ; Li, Shuanghong ; Zhang, Zezheng

  • Author_Institution
    Sch. of Comput. & Inf. Technol., Shanxi Univ., Taiyuan, China
  • Volume
    3
  • fYear
    2010
  • fDate
    Aug. 31 2010-Sept. 3 2010
  • Firstpage
    174
  • Lastpage
    177
  • Abstract
    Identifying the Chinese multiword chunk automatically is a newly emerged technology in the NLP field. As anew strategy, it can effectively improve the performance of the syntactic parsing. The work follows the standard description system of Chinese multiword chunk and has constructed two tag sequence models based on CRF model, which are named as ”the syntactic mark tagging list model” and ”the sequence mark tagging list model” respectively. The corpus used in the training process is called as ”the Chinese multiword chunk bank”, which is provided by Tsinghua University. In the experiments, by selecting appropriate the features and introducing some important rules, the better results are achieved and this system for identifying the Chinese multiword chunk can run well in a restricted area. Thus, it provides a bridge between syntax and semantic content.
  • Keywords
    grammars; natural language processing; random processes; word processing; CRF model; Chinese multiword chunk; NLP field; Tsinghua University; conditional random field; sequence mark tagging list model; syntactic mark tagging list model; syntactic parsing; tag sequence model; Periodic structures; Semantics; Syntactics; Tagging; Topology; Training; Vocabulary; CRF; Multiword; chunk parsing;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Web Intelligence and Intelligent Agent Technology (WI-IAT), 2010 IEEE/WIC/ACM International Conference on
  • Conference_Location
    Toronto, ON
  • Print_ISBN
    978-1-4244-8482-9
  • Electronic_ISBN
    978-0-7695-4191-4
  • Type

    conf

  • DOI
    10.1109/WI-IAT.2010.158
  • Filename
    5614356