• DocumentCode
    1992532
  • Title

    Graph and Topological Structure Mining on Scientific Articles

  • Author

    Wang, Fan ; Jin, Ruoming ; Agrawal, Gagan ; Piontkivska, Helen

  • Author_Institution
    Ohio State Univ., Columbus
  • fYear
    2007
  • fDate
    14-17 Oct. 2007
  • Firstpage
    1318
  • Lastpage
    1322
  • Abstract
    In this paper, we investigate a new approach for literature mining. We use frequent subgraph mining, and its generalization topological structure mining, for finding interesting relationships between gene names and other key biological terms from the text of scientific articles. We show how we can find keywords of interest and represent them as nodes of the graphs. We also propose several methods for inserting edges between these nodes. Our study initially focused on comparing: 1) different methods for constructing edges, and 2) patterns found from sub-graph mining and topological structure mining. Subsequently, we analyzed several frequent topological minors reported by our experiments, and explained their scientific significance. Overall, our study shows the following. First, a simple method of constructing edges, which is based on sliding windows, seems to provide the best results. Second, we are able to find much larger number of well-known and meaningful topological patterns with high support values, as compared to sub-graphs. Overall, the frequent topological minors our algorithm found correspond well to known relationships between genes and biological terms. Thus, we believe that topological structure mining can be a very valuable tool for researchers who are not deeply familiar with the existing literature, and want to obtain a quick summary about known relationships among key scientific names or terms.
  • Keywords
    arrays; biology computing; data mining; edge detection; genetics; graph theory; molecular biophysics; biological terms; edge constructing methods; gene microarrays; gene names; literature mining; scientific articles; sliding window method; subgraph mining; topological minors; topological patterns; topological structure mining; Biology; Computer science; Data mining; Diseases; Pattern matching; Proteins; Sequences; Social network services;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Bioinformatics and Bioengineering, 2007. BIBE 2007. Proceedings of the 7th IEEE International Conference on
  • Conference_Location
    Boston, MA
  • Print_ISBN
    978-1-4244-1509-0
  • Type

    conf

  • DOI
    10.1109/BIBE.2007.4375739
  • Filename
    4375739