Title :
An Overview of BioCreative II.5
Author :
Leitner, Florian ; Mardis, Scott A. ; Krallinger, Martin ; Cesareni, Gianni ; Hirschman, Lynette A. ; Valencia, Alfonso
Author_Institution :
Struct. Biol. & BioComputing Programme, Spanish Nat. Cancer Res. Centre (CNIO), Madrid, Spain
Abstract :
We present the results of the BioCreative II.5 evaluation in association with the FEBS Letters experiment, where authors created Structured Digital Abstracts to capture information about protein-protein interactions. The BioCreative II.5 challenge evaluated automatic annotations from 15 text mining teams based on a gold standard created by reconciling annotations from curators, authors, and automated systems. The tasks were to rank articles for curation based on curatable protein-protein interactions; to identify the interacting proteins (using UniProt identifiers) in the positive articles (61); and to identify interacting protein pairs. There were 595 full-text articles in the evaluation test set, including those both with and without curatable protein interactions. The principal evaluation metrics were the interpolated area under the precision/recall curve (AUC iP/R), and (balanced) F-measure. For article classification, the best AUC iP/R was 0.70; for interacting proteins, the best system achieved good macroaveraged recall (0.73) and interpolated area under the precision/recall curve (0.58), after filtering incorrect species and mapping homonymous orthologs; for interacting protein pairs, the top (filtered, mapped) recall was 0.42 and AUC iP/R was 0.29. Ensemble systems improved performance for the interacting protein task.
Keywords :
bioinformatics; data mining; molecular biophysics; proteins; text analysis; BioCreative 11.5; FEBS letters experiment; automated system; full-text article; homonymous orthologs; interacting protein pair; interpolated area; precision-recall curve; principal evaluation metric; protein-protein interaction; reconciling annotation; structured digital abstract; text mining; Abstracts; Data mining; Databases; Gold; Humans; Ontologies; Organisms; Proteins; Testing; Text mining; Text mining; biological curation.; molecular biology; natural language processing; text analysis;
Journal_Title :
Computational Biology and Bioinformatics, IEEE/ACM Transactions on
DOI :
10.1109/TCBB.2010.61