Title :
Phrase-based Part-of-Speech Tagging
Author :
Finch, Andrew ; Sumita, Eiichiro
Author_Institution :
NICTf-ATRJ Kyoto, Kyoto
fDate :
Aug. 30 2007-Sept. 1 2007
Abstract :
This paper presents a new approach to part-of-speech (POS) tagging in which the basic unit being tagged is a contiguous sequence of words rather than a single word. We run experiments on two different tagsets: the UPENN treebank and a treebank annotated with more ambiguous tags that have a semantic component. We show that the phrase-based system alone is a respectable tagger that exceeds the performance of the ME tagger on the ambiguous tagset. Moreover, when a log-linear model is built using features from both phrase-and word-based techniques, the tagging accuracy improved on both of our data sets yielding the highest reported performance to date on the more ambiguous tagset.
Keywords :
natural languages; text analysis; UPENN treebank; contiguous word sequence; log-linear model; phrase-based part-of-speech tagging; Context modeling; Degradation; Entropy; History; Labeling; Parameter estimation; Predictive models; Radio access networks; System performance; Tagging;
Conference_Titel :
Natural Language Processing and Knowledge Engineering, 2007. NLP-KE 2007. International Conference on
Conference_Location :
Beijing
Print_ISBN :
978-1-4244-1611-0
Electronic_ISBN :
978-1-4244-1611-0
DOI :
10.1109/NLPKE.2007.4368036