• DocumentCode
    2341190
  • Title

    A multi-level text mining method to extract biological relationships

  • Author

    Palakal, Mathew ; Stephens, Matthew ; Mukhopadhyay, Snehasis ; Raje, Rajeev ; Rhodes, Simon

  • Author_Institution
    Dept. of Comput. & Inf. Sci., Indiana Univ., Indianapolis, IN, USA
  • fYear
    2002
  • fDate
    2002
  • Firstpage
    97
  • Lastpage
    108
  • Abstract
    Accurate and computationally efficient approaches in discovering relationships between biological objects from text documents are important for biologists to develop biological models. This paper presents a novel approach to extract relationships between multiple biological objects that are present in a text document. The approach involves object identification, reference resolution, ontology and synonym discovery, and extracting object-object relationships. Hidden Markov models (HMMs), dictionaries, and N-Gram models are used to set the framework to tackle the complex task of extracting object-object relationships. Experiments were carried out using a corpus of one thousand Medline abstracts. Intermediate results were obtained for the object identification process, synonym discovery, and finally the relationship extraction. For a corpus of thousand abstracts, 53 relationships were extracted of which 43 were correct, giving a specificity of 81%. The approach is both adaptable and scalable to new problems as opposed to rule-based methods.
  • Keywords
    bibliographic systems; biology computing; data mining; dictionaries; hidden Markov models; scientific information systems; text analysis; Medline; N-Gram models; bibliographic database; biological models; biological relationships; dictionaries; experiments; hidden Markov models; multi-level text mining method; object identification; object-object relationships; ontology; reference resolution; synonym discovery; Abstracts; Bioinformatics; Biological system modeling; Biology computing; Data mining; Dictionaries; Hidden Markov models; Humans; Proteins; Text mining;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Bioinformatics Conference, 2002. Proceedings. IEEE Computer Society
  • Print_ISBN
    0-7695-1653-X
  • Type

    conf

  • DOI
    10.1109/CSB.2002.1039333
  • Filename
    1039333