DocumentCode
3300534
Title
Integrate statistical model and lexical knowledge for Chinese multiword chunking
Author
Zhou, Qiang ; Yu, Hang
Author_Institution
Centre for Speech & Language Technol., Tsinghua Univ., Beijing
fYear
2008
fDate
19-22 Oct. 2008
Firstpage
1
Lastpage
8
Abstract
Multiword chunking is designed as a shallow parsing technique to recognize external constituent and internal relation tags of a chunk in sentence. In this paper, we propose a new solution to deal with this problem. We design a new relation tagging scheme to represent different intra-chunk relations and make several experiments of feature engineering to select a best baseline statistical model. We also apply outside knowledge from a large-scale lexical relationship knowledge base to improve parsing performance. By integrating all above techniques, we develop a new Chinese MWC parser. Experimental results show its parsing performance can greatly exceed the rule-based parser trained and tested in the same data set.
Keywords
knowledge based systems; natural language processing; Chinese multiword chunking; intrachunk relations; large-scale lexical relationship knowledge; relation tagging scheme; rule-based parser; shallow parsing technique; Design engineering; Information science; Labeling; Laboratories; Large-scale systems; Natural languages; Speech; Tagging; Technological innovation; Testing; Multiword chunking; Outside lexical knowledge base; Partial parsing; Relation tagging scheme;
fLanguage
English
Publisher
ieee
Conference_Titel
Natural Language Processing and Knowledge Engineering, 2008. NLP-KE '08. International Conference on
Conference_Location
Beijing
Print_ISBN
978-1-4244-4515-8
Electronic_ISBN
978-1-4244-2780-2
Type
conf
DOI
10.1109/NLPKE.2008.4906765
Filename
4906765
Link To Document