• DocumentCode
    2963396
  • Title

    Leveraging Gene Ontology Annotations to Improve a Memory-Based Language Understanding System

  • Author

    Livingston, Kevin M. ; Johnson, Helen L. ; Verspoor, Karin ; Hunter, Lawrence E.

  • Author_Institution
    Center for Comput. Pharmacology, Univ. of Colorado Denver, Aurora, CO, USA
  • fYear
    2010
  • fDate
    22-24 Sept. 2010
  • Firstpage
    40
  • Lastpage
    45
  • Abstract
    This work evaluates how detailed knowledge about proteins can be leveraged for language understanding and disambiguation by OpenDMAP. OpenDMAP is a memory-based language understanding system that uses patterns to identify concepts in text. These patterns match not only lexical elements, such as words, but also semantic elements, such as references to proteins. This work started with an existing pattern set used to extract biological activation events from a corpus of GeneRIFs (sentences or phrases that each describe one of many of the functions of a gene). This is a challenging task because many distinct activation concepts, in addition to being semantically similar, are described using very similar language. We augment the previous approach with additional semantic knowledge about proteins, in the form of associated Gene Ontology annotations, and a small corresponding modification to the ontology used by OpenDMAP. By incorporating additional background knowledge we demonstrate that performance can be significantly improved without modifying the pattern set being used. Specifically precision is improved by 20%, at a modest 6% cost to recall. The additional semantic knowledge allows for more specificity in the ontology used by OpenDMAP, which in turn automatically improves the specificity of the patterns being used to extract knowledge from text reducing false positives by 75%.
  • Keywords
    natural language processing; ontologies (artificial intelligence); GeneRIF; OpenDMAP; biological activation events extraction; gene ontology annotation; memory based language understanding system; semantic knowledge; Data mining; Ontologies; Pattern matching; Proteins; Semantics; DMAP; Gene Ontology annotations; NLP; OpenDMAP; biological event extraction; direct memory access parsing; memory; natural langugage processing;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Semantic Computing (ICSC), 2010 IEEE Fourth International Conference on
  • Conference_Location
    Pittsburgh, PA
  • Print_ISBN
    978-1-4244-7912-2
  • Electronic_ISBN
    978-0-7695-4154-9
  • Type

    conf

  • DOI
    10.1109/ICSC.2010.62
  • Filename
    5628828