• DocumentCode
    3087696
  • Title

    Extraction and Grounding of Protein Mutations via Semantic Integration of Text and Sequence Information

  • Author

    Baker, Christopher ; Kanagasabai, Rajaraman

  • Author_Institution
    Univ. of New Brunswick, St. John, NB, Canada
  • fYear
    2011
  • fDate
    22-25 March 2011
  • Firstpage
    556
  • Lastpage
    563
  • Abstract
    Rich information on mutations and their impacts is scattered across scientific texts and literature. Reuse of mutation impact annotations requires grounding mutations to the correct positions on sequences extracted from protein databases as a critical step. This paper presents a generic method for grounding textual mentions of mutation entities to protein sequences, that is based on an OWL-DL ontology driven workflow that integrates text and sequence information in a semantically consistent way. Mutation mentions mined from texts are iteratively mapped onto candidate proteins, and an ontology mining algorithm facilitates their correct grounding to a protein sequence. Using a gold standard corpus of full text articles and corresponding protein sequences we show the proposed method is promising compared to existing approaches.
  • Keywords
    biology computing; data mining; knowledge representation languages; text analysis; OWL-DL ontology; ontology mining algorithm; protein databases; protein mutations grounding; semantic integration; sequence information; text information; Databases; Grounding; Ontologies; Protein sequence; Text mining; Mutation Extraction; Mutation Grounding; Ontologies; Sequence Analysis; Text Mining;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Advanced Information Networking and Applications (AINA), 2011 IEEE International Conference on
  • Conference_Location
    Biopolis
  • ISSN
    1550-445X
  • Print_ISBN
    978-1-61284-313-1
  • Electronic_ISBN
    1550-445X
  • Type

    conf

  • DOI
    10.1109/AINA.2011.112
  • Filename
    5763475