Title of article

A token centric part-of-speech tagger for biomedical text

Author/Authors

Barrett، نويسنده , , Neil and Weber-Jahnke، نويسنده , , Jens، نويسنده ,

Issue Information

روزنامه با شماره پیاپی سال 2014

Pages

10

From page

11

To page

20

Abstract

AbstractObjective ulties with part-of-speech (POS) tagging of biomedical text is accessing and annotating appropriate training corpora. These difficulties may result in POS taggers trained on corpora that differ from the taggerʹs target biomedical text (cross-domain tagging). In such cases where training and target corpora differ tagging accuracy decreases. This paper presents a POS tagger for cross-domain tagging called TcT. s and material timates a tagʹs likelihood for a given token by combining token collocation probabilities and the tokenʹs tag probabilities calculated using a Naive Bayes classifier. We compared TcT to three POS taggers used in the biomedical domain (mxpost, Brill and TnT). We trained each tagger on a non-biomedical corpus and evaluated it on biomedical corpora. s s more accurate in cross-domain tagging than mxpost, Brill and TnT (respective averages 83.9, 81.0, 79.5 and 78.8). sion alysis of tagger performance suggests that lexical differences between corpora have more effect on tagging accuracy than originally considered by previous research work. Biomedical POS tagging algorithms may be modified to improve their cross-domain tagging accuracy without requiring extra training or large training data sets. Future work should reexamine POS tagging methods for biomedical text. This differs from the work to date that has focused on retraining existing POS taggers.

Keywords

Part-of-speech tagging , Biomedical tagging evaluation , Cross-domain biomedical tagging , Token-centric tagging

Journal title

Artificial Intelligence In Medicine

Serial Year

2014

Journal title

Artificial Intelligence In Medicine

Record number

A token centric part-of-speech tagger for biomedical text

Barrett، نويسنده , , Neil and Weber-Jahnke، نويسنده , , Jens، نويسنده ,

1841694