Title :
Graphical Models of Residue Coupling in Protein Families
Author :
Thomas, John ; Ramakrishnan, Naren ; Bailey-Kellogg, Chris
Author_Institution :
Dept. of Comput. Sci., Dartmouth Coll., Hanover, NH
Abstract :
Many statistical measures and algorithmic techniques have been proposed for studying residue coupling in protein families. Generally speaking, two residue positions are considered coupled if, in the sequence record, some of their amino acid type combinations are significantly more common than others. While the proposed approaches have proven useful in finding and describing coupling, a significant missing component is a formal probabilistic model that explicates and compactly represents the coupling, integrates information about sequence, structure, and function, and supports inferential procedures for analysis, diagnosis, and prediction. We present an approach to learning and using probabilistic graphical models of residue coupling (GMRCs). These models capture significant conservation and coupling constraints observable in a multiply aligned set of sequences. Our approach can place a structural prior on considered couplings, so that all identified relationships have direct mechanistic explanations. It can also incorporate information about functional classes, and thereby learn a differential graphical model that distinguishes constraints common to all classes from those unique to individual classes. Such differential models separately account for class-specific conservation and family- wide coupling, two different sources of sequence covariation. They are then able to perform interpretable functional classification of new sequences, explaining classification decisions in terms of the underlying conservation and coupling constraints. We apply our approach in studying both G protein-coupled receptors and PDZ domains, identifying and analyzing family-wide and class-specific constraints, and performing functional classification. The results demonstrate that GMRCs provide a powerful tool for uncovering, representing, and utilizing significant sequence-structure-function relationships in protein families.
Keywords :
biochemistry; biology computing; evolution (biological); genetics; graph theory; inference mechanisms; molecular biophysics; pattern classification; probability; proteins; G protein-coupled receptors; PDZ domains; algorithmic techniques; amino acid types; classification decisions; conservation constraints; coupling constraints; evolutionary covariation; formal probabilistic model; inferential procedures; learning approach; probabilistic graphical models; protein families; protein sequence-structure-function relationships; residue coupling; residue positions; statistical measures; Correlated mutations; evolutionary covariation; functional classification; graphical models; sequence-structure-function relationships; Amino Acid Sequence; Animals; Artificial Intelligence; Cattle; Computer Graphics; Computer Simulation; Humans; Likelihood Functions; Models, Molecular; Models, Statistical; PDZ Domains; Proteins; Receptors, G-Protein-Coupled; Rhodopsin; Sequence Alignment;
Journal_Title :
Computational Biology and Bioinformatics, IEEE/ACM Transactions on
DOI :
10.1109/TCBB.2007.70225