• DocumentCode
    952168
  • Title

    Graphical Models of Residue Coupling in Protein Families

  • Author

    Thomas, John ; Ramakrishnan, Naren ; Bailey-Kellogg, Chris

  • Author_Institution
    Dept. of Comput. Sci., Dartmouth Coll., Hanover, NH
  • Volume
    5
  • Issue
    2
  • fYear
    2008
  • Firstpage
    183
  • Lastpage
    197
  • Abstract
    Many statistical measures and algorithmic techniques have been proposed for studying residue coupling in protein families. Generally speaking, two residue positions are considered coupled if, in the sequence record, some of their amino acid type combinations are significantly more common than others. While the proposed approaches have proven useful in finding and describing coupling, a significant missing component is a formal probabilistic model that explicates and compactly represents the coupling, integrates information about sequence, structure, and function, and supports inferential procedures for analysis, diagnosis, and prediction. We present an approach to learning and using probabilistic graphical models of residue coupling (GMRCs). These models capture significant conservation and coupling constraints observable in a multiply aligned set of sequences. Our approach can place a structural prior on considered couplings, so that all identified relationships have direct mechanistic explanations. It can also incorporate information about functional classes, and thereby learn a differential graphical model that distinguishes constraints common to all classes from those unique to individual classes. Such differential models separately account for class-specific conservation and family- wide coupling, two different sources of sequence covariation. They are then able to perform interpretable functional classification of new sequences, explaining classification decisions in terms of the underlying conservation and coupling constraints. We apply our approach in studying both G protein-coupled receptors and PDZ domains, identifying and analyzing family-wide and class-specific constraints, and performing functional classification. The results demonstrate that GMRCs provide a powerful tool for uncovering, representing, and utilizing significant sequence-structure-function relationships in protein families.
  • Keywords
    biochemistry; biology computing; evolution (biological); genetics; graph theory; inference mechanisms; molecular biophysics; pattern classification; probability; proteins; G protein-coupled receptors; PDZ domains; algorithmic techniques; amino acid types; classification decisions; conservation constraints; coupling constraints; evolutionary covariation; formal probabilistic model; inferential procedures; learning approach; probabilistic graphical models; protein families; protein sequence-structure-function relationships; residue coupling; residue positions; statistical measures; Correlated mutations; evolutionary covariation; functional classification; graphical models; sequence-structure-function relationships; Amino Acid Sequence; Animals; Artificial Intelligence; Cattle; Computer Graphics; Computer Simulation; Humans; Likelihood Functions; Models, Molecular; Models, Statistical; PDZ Domains; Proteins; Receptors, G-Protein-Coupled; Rhodopsin; Sequence Alignment;
  • fLanguage
    English
  • Journal_Title
    Computational Biology and Bioinformatics, IEEE/ACM Transactions on
  • Publisher
    ieee
  • ISSN
    1545-5963
  • Type

    jour

  • DOI
    10.1109/TCBB.2007.70225
  • Filename
    4359882