DocumentCode :
76826
Title :
Joint Optimization for Chinese POS Tagging and Dependency Parsing
Author :
Zhenghua Li ; Min Zhang ; Wanxiang Che ; Ting Liu ; Wenliang Chen
Author_Institution :
Sch. of Comput. Sci. & Technol., Soochow Univ., Suzhou, China
Volume :
22
Issue :
1
fYear :
2014
fDate :
Jan. 2014
Firstpage :
274
Lastpage :
286
Abstract :
Dependency parsing has gained more and more interest in natural language processing in recent years due to its simplicity and general applicability for diverse languages. Previous work demonstrates that part-of-speech (POS) is an indispensable feature in dependency parsing since pure lexical features suffer from serious data sparseness problem. However, due to little morphological changes, Chinese POS tagging has proven to be much more challenging than morphology-richer languages such as English (94% vs. 97% on POS tagging accuracy). This leads to severe error propagation for Chinese dependency parsing. Our experiments show that parsing accuracy drops by about 6% when replacing manual POS tags of the input sentence with automatic ones generated by a state-of-the-art statistical POS tagger. To address this issue, this paper proposes a solution by jointly optimizing POS tagging and dependency parsing in a unique model. We propose for our joint models several dynamic programming based decoding algorithms which can incorporate rich POS tagging and syntactic features. Then we present an effective pruning strategy to reduce the search space of candidate POS tags, leading to significant improvement of parsing speed. Experimental results on two Chinese data sets, i.e. Penn Chinese Treebank 5.1 and Penn Chinese Treebank 7, demonstrate that our joint models significantly improve both the state-of-the-art tagging and parsing accuracies. Detailed analysis shows that the joint method can help resolve syntax-sensitive POS ambiguities {ssrNN,ssrVV}. In return, the POS tags become more reliable and helpful for parsing since the syntactic features are used in POS tagging. This is the fundamental reason for the performance improvement.
Keywords :
dynamic programming; grammars; natural language processing; search problems; statistical analysis; Chinese POS tagging; Chinese dependency parsing; Penn Chinese Tree- bank 7; Penn Chinese Treebank 5.1; diverse languages; dynamic programming based decoding algorithms; error propagation; natural language processing; parsing speed improvement; part-of-speech; pruning strategy; search space reduce; statistical POS tagger; syntax-sensitive POS ambiguities; Accuracy; Decoding; Joints; Pipelines; Syntactics; Tagging; Vectors; Dependency parsing; dynamic programming; joint models; part-of-speech tagging;
fLanguage :
English
Journal_Title :
Audio, Speech, and Language Processing, IEEE/ACM Transactions on
Publisher :
ieee
ISSN :
2329-9290
Type :
jour
DOI :
10.1109/TASLP.2013.2288081
Filename :
6651773
Link To Document :
بازگشت