• DocumentCode
    2949905
  • Title

    BioWizard: Discovering and validating associations between biological entities by integrated analysis of scientific literature and experimental data

  • Author

    Spampinato, Concetto ; Giordano, Daniela ; Kavasidis, Isaak ; Milardo, Sebastiano

  • Author_Institution
    Dept. of Electr., Electron. & Comput. Eng., Univ. of Catania, Catania, Italy
  • fYear
    2012
  • fDate
    20-22 June 2012
  • Firstpage
    1
  • Lastpage
    6
  • Abstract
    In this paper, we present BioWizard, a bioinformatics knowledge discovery tool for extracting and validating implicit associations between biological entities. By mining specialized scientific literature, BioWizard not only generates biological hypotheses in the form of associations between genes, proteins and diseases, but also validates the plausibility of such associations against high-throughput biological data (microarrays) and annotated databases. The main novelties of the proposed approach are that: (1) it infers associations between biological entities by mining full text papers instead of only abstracts as usually performed by the existing tools, (2) a named entity recognition that improves the precision of the derived associations by enriching the vocabularies used in the mining loop with terms extracted directly from the text and, (3) the inferred associations are filtered according to their evidence in experimental data. We tested the precision and the recall of our system in retrieving known-associations (which did not appear in the same document) from gold standards and the results shown the ability of BioWizard in retrieving valid associations, thus providing a valuable tool for the use of biomedical researchers to speed up scientific progress.
  • Keywords
    bioinformatics; data mining; diseases; genetics; information retrieval; molecular biophysics; proteins; BioWizard; annotated databases; bioinformatics knowledge discovery tool; biological entities; biological hypothesis generation; diseases; experimental data; genes; high-throughput biological data; integrated analysis; known-association retrieval; microarray data; named entity recognition; proteins; scientific literature; text mining; valid association retrieval; Databases; Dictionaries; Diseases; Protein engineering; Proteins; Vectors;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Computer-Based Medical Systems (CBMS), 2012 25th International Symposium on
  • Conference_Location
    Rome
  • ISSN
    1063-7125
  • Print_ISBN
    978-1-4673-2049-8
  • Type

    conf

  • DOI
    10.1109/CBMS.2012.6266327
  • Filename
    6266327