Title :
A Unified Model for Joint Chinese Word Segmentation and POS Tagging with Heterogeneous Annotation Corpora
Author :
Jiayi Zhao ; Xipeng Qiu ; Xuanjing Huang
Author_Institution :
Sch. of Comput. Sci., Fudan Univ., Shanghai, China
Abstract :
Chinese word segmentation and part-of-speech tagging (S&T) are fundamental steps for more advanced Chinese language processing tasks. Recently, it has attracted more and more research interests to exploit heterogeneous annotation corpora for Chinese S&T. In this paper, we propose a unified model for Chinese S&T with heterogeneous annotation corpora. We first automatically construct a loose and uncertain mapping between two representative the heterogeneous corpora, Penn Chinese Tree bank (CTB) and PKU´s People´s Daily (PPD). Then we regard the Chinese S&T with heterogeneous corpora as two ``related´´ tasks and train our unified model on two heterogeneous corpora simultaneously. Experiments show that our unified model can boost the performances of both of the heterogeneous corpora by using the shared information, and achieves significant improvements over the state-of-the-art methods.
Keywords :
computational linguistics; natural language processing; CTB; Chinese S&T; Chinese language processing tasks; Chinese word segmentation; PKU people daily; POS tagging; PPD; Penn Chinese tree bank; heterogeneous annotation corpora; loose mapping; part-of-speech tagging; uncertain mapping; unified model; Bismuth; Frequency locked loops; Frequency modulation; Integrated circuits; Chinese word segmentation; POS Tagging; heterogeneous annotation;
Conference_Titel :
Asian Language Processing (IALP), 2013 International Conference on
Conference_Location :
Urumqi
DOI :
10.1109/IALP.2013.64