DocumentCode
2341190
Title
A multi-level text mining method to extract biological relationships
Author
Palakal, Mathew ; Stephens, Matthew ; Mukhopadhyay, Snehasis ; Raje, Rajeev ; Rhodes, Simon
Author_Institution
Dept. of Comput. & Inf. Sci., Indiana Univ., Indianapolis, IN, USA
fYear
2002
fDate
2002
Firstpage
97
Lastpage
108
Abstract
Accurate and computationally efficient approaches in discovering relationships between biological objects from text documents are important for biologists to develop biological models. This paper presents a novel approach to extract relationships between multiple biological objects that are present in a text document. The approach involves object identification, reference resolution, ontology and synonym discovery, and extracting object-object relationships. Hidden Markov models (HMMs), dictionaries, and N-Gram models are used to set the framework to tackle the complex task of extracting object-object relationships. Experiments were carried out using a corpus of one thousand Medline abstracts. Intermediate results were obtained for the object identification process, synonym discovery, and finally the relationship extraction. For a corpus of thousand abstracts, 53 relationships were extracted of which 43 were correct, giving a specificity of 81%. The approach is both adaptable and scalable to new problems as opposed to rule-based methods.
Keywords
bibliographic systems; biology computing; data mining; dictionaries; hidden Markov models; scientific information systems; text analysis; Medline; N-Gram models; bibliographic database; biological models; biological relationships; dictionaries; experiments; hidden Markov models; multi-level text mining method; object identification; object-object relationships; ontology; reference resolution; synonym discovery; Abstracts; Bioinformatics; Biological system modeling; Biology computing; Data mining; Dictionaries; Hidden Markov models; Humans; Proteins; Text mining;
fLanguage
English
Publisher
ieee
Conference_Titel
Bioinformatics Conference, 2002. Proceedings. IEEE Computer Society
Print_ISBN
0-7695-1653-X
Type
conf
DOI
10.1109/CSB.2002.1039333
Filename
1039333
Link To Document