• DocumentCode
    2710868
  • Title

    Frequent Subgraph Retrieval in Geometric Graph Databases

  • Author

    Nowozin, Sebastian ; Tsuda, Koji

  • Author_Institution
    Max Planck Inst. for Biol. Cybern., Tubingen
  • fYear
    2008
  • fDate
    15-19 Dec. 2008
  • Firstpage
    953
  • Lastpage
    958
  • Abstract
    Discovery of knowledge from geometric graph databases is of particular importance in chemistry and biology, because chemical compounds and proteins are represented as graphs with 3D geometric coordinates. In such applications, scientists are not interested in the statistics of the whole database. Instead they need information about a novel drug candidate or protein at hand, represented as a query graph. We propose a polynomial-delay algorithm for geometric frequent subgraph retrieval. It enumerates all subgraphs of a single given query graph which are frequent geometric epsi-subgraphs under the entire class of rigid geometric transformations in a database. By using geometric epsi-subgraphs, we achieve tolerance against variations in geometry. We compare the proposed algorithm to gSpan on chemical compound data, and we show that for a given minimum support the total number of frequent patterns is substantially limited by requiring geometric matching. Although the computation time per pattern is larger than for non-geometric graph mining, the total time is within a reasonable level even for small minimum support.
  • Keywords
    data mining; database management systems; geometry; graph theory; query processing; 3D geometric coordinates; chemical compound data; geometric frequent subgraph retrieval; geometric graph databases; knowledge discovery; query graph; Chemical compounds; Chemistry; Drugs; Geometry; Information retrieval; Pattern matching; Polynomials; Proteins; Spatial databases; Statistics; frequent graph mining; geometric graph mining; geometric patterns;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Data Mining, 2008. ICDM '08. Eighth IEEE International Conference on
  • Conference_Location
    Pisa
  • ISSN
    1550-4786
  • Print_ISBN
    978-0-7695-3502-9
  • Type

    conf

  • DOI
    10.1109/ICDM.2008.38
  • Filename
    4781207