DocumentCode :
652074
Title :
Dealing with Data Sparsity in Drug Named Entity Recognition
Author :
Piliouras, Dimitrios ; Korkontzelos, Ioannis ; Dowsey, Andrew ; Ananiadou, Sophia
Author_Institution :
Sch. of Comput. Sci., Univ. of Manchester, Manchester, UK
fYear :
2013
fDate :
9-11 Sept. 2013
Firstpage :
14
Lastpage :
21
Abstract :
Drug Named Entity Recognition (drug-NER) is a critical step for complex Biomedical Natural Language Processing (BioNLP) tasks such as the extraction of pharmaco-genomic, pharmaco-dynamic and pharmaco-kinetic parameters. Large quantities of high quality training data are almost always a prerequisite for employing supervised machine-learning (ML) techniques to achieve high classification performance. However, the human labour needed to produce and maintain such resources is a detrimental limitation. In this study, we attempt to improve the performance of drug NER without relying exclusively on manual annotations. Instead, we use either a small gold-standard corpus (120 abstracts) or no corpus at all. In our approach, we use a emph{voting system} to combine a number of heterogeneous models to enhance performance. Moreover, 11 regular-expressions that capture common drug suffixes were evolved via genetic-programming. We evaluate our approach against state-of-the-art recognisers trained on manual annotations, automatic annotations and a mixture of both. Aggregate classifiers are shown to improve performance, achieving a maximum F-score of 95%. In addition, combined models trained on mixed data are shown to achieve comparable performance to models trained exclusively on gold-standard data.
Keywords :
drugs; genetic algorithms; medical computing; natural language processing; pattern classification; BioNLP tasks; automatic annotations; biomedical natural language processing tasks; data sparsity; drug named entity recognition; drug-NER; genetic- programming; gold- standard data; manual annotations; voting system; Data models; Dictionaries; Drugs; Proteins; Silver; Training; Training data; data-sparsity; drug-NER; genetic-programming;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Healthcare Informatics (ICHI), 2013 IEEE International Conference on
Conference_Location :
Philadelphia, PA
Type :
conf
DOI :
10.1109/ICHI.2013.9
Filename :
6680456
Link To Document :
بازگشت