DocumentCode
680202
Title
Using silver and semi-gold standard corpora to compare open named entity recognisers
Author
Groza, Tudor ; Oellrich, Anika ; Collier, Nicholson
Author_Institution
Sch. of ITEE, Univ. of Queensland, St. Lucia, QLD, Australia
fYear
2013
fDate
18-21 Dec. 2013
Firstpage
481
Lastpage
485
Abstract
Ontologies have become a central resource for defining biomedical concepts but linkage to and from textual data is still an unresolved technology. In this paper we approach the task of concept recognition in text by comparing four extant systems (cTAKES, NCBO Annotator, BeCAS and Metamap) with default parameter settings. The systems are compared on benchmark data consisting of 2,163 scientific abstracts and 906 clinical trial reports using an automatically constructed “silver” standard and a random semi-gold standard evaluation methodology. Furthermore, evaluation is conducted on the basis of specific concept identifiers. Experimental results show: (i) Generally higher levels of concept recognition on clinical trial reports than on scientific abstracts; (ii) The best performing system we observed on the silver standard was cTAKES on both the abstract and clinical trial corpora, however NCBO Annotator performed stronger when considering only the selected broad semantic types; (iii) BeCAS and Metamap had a tendency to annotate coarser-grained annotations; (iv) the random semi-gold evaluation places an upper bound on the performance of systems. This shows broad agreement with the silver standard evaluation but highlights areas where the silver standard methodology might be improved.
Keywords
medical computing; ontologies (artificial intelligence); BeCAS; Metamap; NCBO Annotator; NCBO annotator; biomedical concepts; cTAKES; clinical trial corpora; clinical trial reports; coarser-grained annotations; concept recognition; default parameter settings; extant systems; metamap; ontologies; open named entity recognisers; scientific abstracts; semigold standard corpora; silver standard corpora; Abstracts; Bioinformatics; Clinical trials; Semantics; Silver; Standards; Unified modeling language;
fLanguage
English
Publisher
ieee
Conference_Titel
Bioinformatics and Biomedicine (BIBM), 2013 IEEE International Conference on
Conference_Location
Shanghai
Type
conf
DOI
10.1109/BIBM.2013.6732541
Filename
6732541
Link To Document