DocumentCode
3087696
Title
Extraction and Grounding of Protein Mutations via Semantic Integration of Text and Sequence Information
Author
Baker, Christopher ; Kanagasabai, Rajaraman
Author_Institution
Univ. of New Brunswick, St. John, NB, Canada
fYear
2011
fDate
22-25 March 2011
Firstpage
556
Lastpage
563
Abstract
Rich information on mutations and their impacts is scattered across scientific texts and literature. Reuse of mutation impact annotations requires grounding mutations to the correct positions on sequences extracted from protein databases as a critical step. This paper presents a generic method for grounding textual mentions of mutation entities to protein sequences, that is based on an OWL-DL ontology driven workflow that integrates text and sequence information in a semantically consistent way. Mutation mentions mined from texts are iteratively mapped onto candidate proteins, and an ontology mining algorithm facilitates their correct grounding to a protein sequence. Using a gold standard corpus of full text articles and corresponding protein sequences we show the proposed method is promising compared to existing approaches.
Keywords
biology computing; data mining; knowledge representation languages; text analysis; OWL-DL ontology; ontology mining algorithm; protein databases; protein mutations grounding; semantic integration; sequence information; text information; Databases; Grounding; Ontologies; Protein sequence; Text mining; Mutation Extraction; Mutation Grounding; Ontologies; Sequence Analysis; Text Mining;
fLanguage
English
Publisher
ieee
Conference_Titel
Advanced Information Networking and Applications (AINA), 2011 IEEE International Conference on
Conference_Location
Biopolis
ISSN
1550-445X
Print_ISBN
978-1-61284-313-1
Electronic_ISBN
1550-445X
Type
conf
DOI
10.1109/AINA.2011.112
Filename
5763475
Link To Document