• DocumentCode
    3300534
  • Title

    Integrate statistical model and lexical knowledge for Chinese multiword chunking

  • Author

    Zhou, Qiang ; Yu, Hang

  • Author_Institution
    Centre for Speech & Language Technol., Tsinghua Univ., Beijing
  • fYear
    2008
  • fDate
    19-22 Oct. 2008
  • Firstpage
    1
  • Lastpage
    8
  • Abstract
    Multiword chunking is designed as a shallow parsing technique to recognize external constituent and internal relation tags of a chunk in sentence. In this paper, we propose a new solution to deal with this problem. We design a new relation tagging scheme to represent different intra-chunk relations and make several experiments of feature engineering to select a best baseline statistical model. We also apply outside knowledge from a large-scale lexical relationship knowledge base to improve parsing performance. By integrating all above techniques, we develop a new Chinese MWC parser. Experimental results show its parsing performance can greatly exceed the rule-based parser trained and tested in the same data set.
  • Keywords
    knowledge based systems; natural language processing; Chinese multiword chunking; intrachunk relations; large-scale lexical relationship knowledge; relation tagging scheme; rule-based parser; shallow parsing technique; Design engineering; Information science; Labeling; Laboratories; Large-scale systems; Natural languages; Speech; Tagging; Technological innovation; Testing; Multiword chunking; Outside lexical knowledge base; Partial parsing; Relation tagging scheme;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Natural Language Processing and Knowledge Engineering, 2008. NLP-KE '08. International Conference on
  • Conference_Location
    Beijing
  • Print_ISBN
    978-1-4244-4515-8
  • Electronic_ISBN
    978-1-4244-2780-2
  • Type

    conf

  • DOI
    10.1109/NLPKE.2008.4906765
  • Filename
    4906765