• DocumentCode
    1757075
  • Title

    Extended Graph-Based Models for Enhanced Similarity Search in Cavbase

  • Author

    Krotzky, Timo ; Fober, Thomas ; Hullermeier, Eyke ; Klebe, Gerhard

  • Author_Institution
    Dept. of Pharm. Chem., Philipps-Univ., Marburg, Germany
  • Volume
    11
  • Issue
    5
  • fYear
    2014
  • fDate
    Sept.-Oct. 1 2014
  • Firstpage
    878
  • Lastpage
    890
  • Abstract
    To calculate similarities between molecular structures, measures based on the maximum common subgraph are frequently applied. For the comparison of protein binding sites, these measures are not fully appropriate since graphs representing binding sites on a detailed atomic level tend to get very large. In combination with an NP-hard problem, a large graph leads to a computationally demanding task. Therefore, for the comparison of binding sites, a less detailed coarse graph model is used building upon so-called pseudocenters. Consistently, a loss of structural data is caused since many atoms are discarded and no information about the shape of the binding site is considered. This is usually resolved by performing subsequent calculations based on additional information. These steps are usually quite expensive, making the whole approach very slow. The main drawback of a graph-based model solely based on pseudocenters, however, is the loss of information about the shape of the protein surface. In this study, we propose a novel and efficient modeling formalism that does not increase the size of the graph model compared to the original approach, but leads to graphs containing considerably more information assigned to the nodes. More specifically, additional descriptors considering surface characteristics are extracted from the local surface and attributed to the pseudocenters stored in Cavbase. These properties are evaluated as additional node labels, which lead to a gain of information and allow for much faster but still very accurate comparisons between different structures.
  • Keywords
    bioinformatics; graphs; molecular biophysics; proteins; Cavbase; NP-hard problem; coarse graph model; computationally demanding task; enhanced similarity search; extended graph-based models; maximum common subgraph; molecular structures; protein binding sites; protein surface; pseudocenters; structural data; Bioinformatics; Histograms; Principal component analysis; Proteins; Shape; Vectors; Cavbase; distance; maximum common subgraph; protein binding site; similarity measure; structural alignment;
  • fLanguage
    English
  • Journal_Title
    Computational Biology and Bioinformatics, IEEE/ACM Transactions on
  • Publisher
    ieee
  • ISSN
    1545-5963
  • Type

    jour

  • DOI
    10.1109/TCBB.2014.2325020
  • Filename
    6853389