• DocumentCode
    1282767
  • Title

    An Overview of BioCreative II.5

  • Author

    Leitner, Florian ; Mardis, Scott A. ; Krallinger, Martin ; Cesareni, Gianni ; Hirschman, Lynette A. ; Valencia, Alfonso

  • Author_Institution
    Struct. Biol. & BioComputing Programme, Spanish Nat. Cancer Res. Centre (CNIO), Madrid, Spain
  • Volume
    7
  • Issue
    3
  • fYear
    2010
  • Firstpage
    385
  • Lastpage
    399
  • Abstract
    We present the results of the BioCreative II.5 evaluation in association with the FEBS Letters experiment, where authors created Structured Digital Abstracts to capture information about protein-protein interactions. The BioCreative II.5 challenge evaluated automatic annotations from 15 text mining teams based on a gold standard created by reconciling annotations from curators, authors, and automated systems. The tasks were to rank articles for curation based on curatable protein-protein interactions; to identify the interacting proteins (using UniProt identifiers) in the positive articles (61); and to identify interacting protein pairs. There were 595 full-text articles in the evaluation test set, including those both with and without curatable protein interactions. The principal evaluation metrics were the interpolated area under the precision/recall curve (AUC iP/R), and (balanced) F-measure. For article classification, the best AUC iP/R was 0.70; for interacting proteins, the best system achieved good macroaveraged recall (0.73) and interpolated area under the precision/recall curve (0.58), after filtering incorrect species and mapping homonymous orthologs; for interacting protein pairs, the top (filtered, mapped) recall was 0.42 and AUC iP/R was 0.29. Ensemble systems improved performance for the interacting protein task.
  • Keywords
    bioinformatics; data mining; molecular biophysics; proteins; text analysis; BioCreative 11.5; FEBS letters experiment; automated system; full-text article; homonymous orthologs; interacting protein pair; interpolated area; precision-recall curve; principal evaluation metric; protein-protein interaction; reconciling annotation; structured digital abstract; text mining; Abstracts; Data mining; Databases; Gold; Humans; Ontologies; Organisms; Proteins; Testing; Text mining; Text mining; biological curation.; molecular biology; natural language processing; text analysis;
  • fLanguage
    English
  • Journal_Title
    Computational Biology and Bioinformatics, IEEE/ACM Transactions on
  • Publisher
    ieee
  • ISSN
    1545-5963
  • Type

    jour

  • DOI
    10.1109/TCBB.2010.61
  • Filename
    5535012