DocumentCode :
680202
Title :
Using silver and semi-gold standard corpora to compare open named entity recognisers
Author :
Groza, Tudor ; Oellrich, Anika ; Collier, Nicholson
Author_Institution :
Sch. of ITEE, Univ. of Queensland, St. Lucia, QLD, Australia
fYear :
2013
fDate :
18-21 Dec. 2013
Firstpage :
481
Lastpage :
485
Abstract :
Ontologies have become a central resource for defining biomedical concepts but linkage to and from textual data is still an unresolved technology. In this paper we approach the task of concept recognition in text by comparing four extant systems (cTAKES, NCBO Annotator, BeCAS and Metamap) with default parameter settings. The systems are compared on benchmark data consisting of 2,163 scientific abstracts and 906 clinical trial reports using an automatically constructed “silver” standard and a random semi-gold standard evaluation methodology. Furthermore, evaluation is conducted on the basis of specific concept identifiers. Experimental results show: (i) Generally higher levels of concept recognition on clinical trial reports than on scientific abstracts; (ii) The best performing system we observed on the silver standard was cTAKES on both the abstract and clinical trial corpora, however NCBO Annotator performed stronger when considering only the selected broad semantic types; (iii) BeCAS and Metamap had a tendency to annotate coarser-grained annotations; (iv) the random semi-gold evaluation places an upper bound on the performance of systems. This shows broad agreement with the silver standard evaluation but highlights areas where the silver standard methodology might be improved.
Keywords :
medical computing; ontologies (artificial intelligence); BeCAS; Metamap; NCBO Annotator; NCBO annotator; biomedical concepts; cTAKES; clinical trial corpora; clinical trial reports; coarser-grained annotations; concept recognition; default parameter settings; extant systems; metamap; ontologies; open named entity recognisers; scientific abstracts; semigold standard corpora; silver standard corpora; Abstracts; Bioinformatics; Clinical trials; Semantics; Silver; Standards; Unified modeling language;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Bioinformatics and Biomedicine (BIBM), 2013 IEEE International Conference on
Conference_Location :
Shanghai
Type :
conf
DOI :
10.1109/BIBM.2013.6732541
Filename :
6732541
Link To Document :
بازگشت