DocumentCode
1282767
Title
An Overview of BioCreative II.5
Author
Leitner, Florian ; Mardis, Scott A. ; Krallinger, Martin ; Cesareni, Gianni ; Hirschman, Lynette A. ; Valencia, Alfonso
Author_Institution
Struct. Biol. & BioComputing Programme, Spanish Nat. Cancer Res. Centre (CNIO), Madrid, Spain
Volume
7
Issue
3
fYear
2010
Firstpage
385
Lastpage
399
Abstract
We present the results of the BioCreative II.5 evaluation in association with the FEBS Letters experiment, where authors created Structured Digital Abstracts to capture information about protein-protein interactions. The BioCreative II.5 challenge evaluated automatic annotations from 15 text mining teams based on a gold standard created by reconciling annotations from curators, authors, and automated systems. The tasks were to rank articles for curation based on curatable protein-protein interactions; to identify the interacting proteins (using UniProt identifiers) in the positive articles (61); and to identify interacting protein pairs. There were 595 full-text articles in the evaluation test set, including those both with and without curatable protein interactions. The principal evaluation metrics were the interpolated area under the precision/recall curve (AUC iP/R), and (balanced) F-measure. For article classification, the best AUC iP/R was 0.70; for interacting proteins, the best system achieved good macroaveraged recall (0.73) and interpolated area under the precision/recall curve (0.58), after filtering incorrect species and mapping homonymous orthologs; for interacting protein pairs, the top (filtered, mapped) recall was 0.42 and AUC iP/R was 0.29. Ensemble systems improved performance for the interacting protein task.
Keywords
bioinformatics; data mining; molecular biophysics; proteins; text analysis; BioCreative 11.5; FEBS letters experiment; automated system; full-text article; homonymous orthologs; interacting protein pair; interpolated area; precision-recall curve; principal evaluation metric; protein-protein interaction; reconciling annotation; structured digital abstract; text mining; Abstracts; Data mining; Databases; Gold; Humans; Ontologies; Organisms; Proteins; Testing; Text mining; Text mining; biological curation.; molecular biology; natural language processing; text analysis;
fLanguage
English
Journal_Title
Computational Biology and Bioinformatics, IEEE/ACM Transactions on
Publisher
ieee
ISSN
1545-5963
Type
jour
DOI
10.1109/TCBB.2010.61
Filename
5535012
Link To Document