DocumentCode :
3381484
Title :
A comparison of part of speech taggers in the task of changing to a new domain
Author :
Boggess, Lois ; Hamaker, Janna S. ; Duncan, Richard ; Klimek, Lee ; Wu, Yufeng ; Zeng, Yu
Author_Institution :
Dept. of Comput. Sci., Mississippi State Univ., MS, USA
fYear :
1999
fDate :
1999
Firstpage :
574
Lastpage :
578
Abstract :
Part-of-speech tagging in real-world applications is performed on text in domains which are different from the publicly available large training data sets. The two most successful part-of-speech taggers are trained on the Wall Street Journal corpus, a corpus of millions of words. We compare their performance on a test set from a different domain-astronomy-from documents that are available on the World Wide Web. The Maximum Entropy Part of Speech Tagger (MXPOST) and the Transformation-Based Learning Tagger are well-known and widely used in language research and development systems. The two taggers were tested in several modes: (1) after training on the Wall Street Journal corpus only, (2) after training on only a small body of text from our astronomy domain, (3) with and without an auxiliary lexicon derived from many astronomy-related Web documents, and (4) after incremental training-that is, having been trained on the Wall Street Journal, with additional training from the specific domain. One conclusion from the experiment is that different taggers exhibit different biases when trained on the same data
Keywords :
astronomy computing; grammars; information resources; learning (artificial intelligence); maximum entropy methods; natural languages; text analysis; MXPOST; Maximum Entropy Part of Speech Tagger; Transformation-Based Learning Tagger; Wall Street Journal corpus; World Wide Web; astronomy-related Web documents; auxiliary lexicon; bias; incremental training; language R&D systems; performance; text domains; Data mining; Electrical capacitance tomography; Entropy; Laboratories; Natural language processing; Natural languages; Research and development; Speech; Tagging; Testing;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Information Intelligence and Systems, 1999. Proceedings. 1999 International Conference on
Conference_Location :
Bethesda, MD
Print_ISBN :
0-7695-0446-9
Type :
conf
DOI :
10.1109/ICIIS.1999.810350
Filename :
810350
Link To Document :
بازگشت