Title :
A comparison of part of speech taggers in the task of changing to a new domain
Author :
Boggess, Lois ; Hamaker, Janna S. ; Duncan, Richard ; Klimek, Lee ; Wu, Yufeng ; Zeng, Yu
Author_Institution :
Dept. of Comput. Sci., Mississippi State Univ., MS, USA
Abstract :
Part-of-speech tagging in real-world applications is performed on text in domains which are different from the publicly available large training data sets. The two most successful part-of-speech taggers are trained on the Wall Street Journal corpus, a corpus of millions of words. We compare their performance on a test set from a different domain-astronomy-from documents that are available on the World Wide Web. The Maximum Entropy Part of Speech Tagger (MXPOST) and the Transformation-Based Learning Tagger are well-known and widely used in language research and development systems. The two taggers were tested in several modes: (1) after training on the Wall Street Journal corpus only, (2) after training on only a small body of text from our astronomy domain, (3) with and without an auxiliary lexicon derived from many astronomy-related Web documents, and (4) after incremental training-that is, having been trained on the Wall Street Journal, with additional training from the specific domain. One conclusion from the experiment is that different taggers exhibit different biases when trained on the same data
Keywords :
astronomy computing; grammars; information resources; learning (artificial intelligence); maximum entropy methods; natural languages; text analysis; MXPOST; Maximum Entropy Part of Speech Tagger; Transformation-Based Learning Tagger; Wall Street Journal corpus; World Wide Web; astronomy-related Web documents; auxiliary lexicon; bias; incremental training; language R&D systems; performance; text domains; Data mining; Electrical capacitance tomography; Entropy; Laboratories; Natural language processing; Natural languages; Research and development; Speech; Tagging; Testing;
Conference_Titel :
Information Intelligence and Systems, 1999. Proceedings. 1999 International Conference on
Conference_Location :
Bethesda, MD
Print_ISBN :
0-7695-0446-9
DOI :
10.1109/ICIIS.1999.810350