Title :
Predicting Ligand Binding Residues and Functional Sites Using Multipositional Correlations with Graph Theoretic Clustering and Kernel CCA
Author :
González, Alvaro J. ; Liao, Li ; Wu, Cathy H.
Author_Institution :
Comput. & Inf. Sci. Dept., Univ. of Delaware, Newark, DE, USA
Abstract :
We present a new computational method for predicting ligand binding residues and functional sites in protein sequences. These residues and sites tend to be not only conserved, but also exhibit strong correlation due to the selection pressure during evolution in order to maintain the required structure and/or function. To explore the effect of correlations among multiple positions in the sequences, the method uses graph theoretic clustering and kernel-based canonical correlation analysis (kCCA) to identify binding and functional sites in protein sequences as the residues that exhibit strong correlation between the residues´ evolutionary characterization at the sites and the structure-based functional classification of the proteins in the context of a functional family. The results of testing the method on two well-curated data sets show that the prediction accuracy as measured by Receiver Operating Characteristic (ROC) scores improves significantly when multipositional correlations are accounted for.
Keywords :
biology computing; evolutionary computation; graph theory; molecular biophysics; molecular configurations; proteins; computational method; evolution; graph theoretic clustering; kernel-based canonical correlation analysis; ligand binding residues; multipositional correlations; protein sequences; receiver operating characteristic score; structure-based functional classification; Amino acids; Bioinformatics; Computational biology; Correlation; Eigenvalues and eigenfunctions; Kernel; Proteins; Functional residues; cliques.; kernel canonical correlation analysis; multiple sequence alignments; specificity determining positions; Algorithms; Amino Acid Sequence; Binding Sites; Cluster Analysis; Computational Biology; Databases, Protein; Humans; Ligands; Molecular Sequence Data; Protein Conformation; Proteins; ROC Curve; Sequence Alignment; Sequence Analysis, Protein;
Journal_Title :
Computational Biology and Bioinformatics, IEEE/ACM Transactions on
DOI :
10.1109/TCBB.2011.136