Graphical Models of Residue Coupling in Protein Families

Author

Thomas, John ; Ramakrishnan, Naren ; Bailey-Kellogg, Chris

Author_Institution

Dept. of Comput. Sci., Dartmouth Coll., Hanover, NH

Volume

Issue

fYear

2008

Firstpage

183

Lastpage

197

Abstract

Many statistical measures and algorithmic techniques have been proposed for studying residue coupling in protein families. Generally speaking, two residue positions are considered coupled if, in the sequence record, some of their amino acid type combinations are significantly more common than others. While the proposed approaches have proven useful in finding and describing coupling, a significant missing component is a formal probabilistic model that explicates and compactly represents the coupling, integrates information about sequence, structure, and function, and supports inferential procedures for analysis, diagnosis, and prediction. We present an approach to learning and using probabilistic graphical models of residue coupling (GMRCs). These models capture significant conservation and coupling constraints observable in a multiply aligned set of sequences. Our approach can place a structural prior on considered couplings, so that all identified relationships have direct mechanistic explanations. It can also incorporate information about functional classes, and thereby learn a differential graphical model that distinguishes constraints common to all classes from those unique to individual classes. Such differential models separately account for class-specific conservation and family- wide coupling, two different sources of sequence covariation. They are then able to perform interpretable functional classification of new sequences, explaining classification decisions in terms of the underlying conservation and coupling constraints. We apply our approach in studying both G protein-coupled receptors and PDZ domains, identifying and analyzing family-wide and class-specific constraints, and performing functional classification. The results demonstrate that GMRCs provide a powerful tool for uncovering, representing, and utilizing significant sequence-structure-function relationships in protein families.

Keywords

biochemistry; biology computing; evolution (biological); genetics; graph theory; inference mechanisms; molecular biophysics; pattern classification; probability; proteins; G protein-coupled receptors; PDZ domains; algorithmic techniques; amino acid types; classification decisions; conservation constraints; coupling constraints; evolutionary covariation; formal probabilistic model; inferential procedures; learning approach; probabilistic graphical models; protein families; protein sequence-structure-function relationships; residue coupling; residue positions; statistical measures; Correlated mutations; evolutionary covariation; functional classification; graphical models; sequence-structure-function relationships; Amino Acid Sequence; Animals; Artificial Intelligence; Cattle; Computer Graphics; Computer Simulation; Humans; Likelihood Functions; Models, Molecular; Models, Statistical; PDZ Domains; Proteins; Receptors, G-Protein-Coupled; Rhodopsin; Sequence Alignment;

fLanguage

English

Journal_Title

Computational Biology and Bioinformatics, IEEE/ACM Transactions on

Publisher

ieee

ISSN

1545-5963

Type

jour

DOI

10.1109/TCBB.2007.70225

Filename

4359882

Link To Document

https://search.isc.ac/dl/search/defaultta.aspx?DTC=49&DC=952168