Detecting Positively Selected Sites From Amino Acid Sequences: An Implicit Codon Model

Author

Zheng Ouyang ; Jie Liang

Author_Institution

Univ. of Illinois at Chicago, Chicago

fYear

2007

fDate

22-26 Aug. 2007

Firstpage

5302

Lastpage

5306

Abstract

Fixation of advantageous mutations is an important evolutionary force driving the accelerated protein diversification. However, the standard phylogenetic approach to infer positive selection is based on relative rate of nonsynonymous to synonymous substitutions, and requires the knowledge of DNA sequences, hence precludes its application to family of remotely related sequences where saturated substitution occur. In this study, we develop a new method to detect positive selection directly from amino acid sequences by treating codon usage as hidden parameters. For a given amino acid sequence set and a phylogenetic tree, we use a reversible continuous time Markov process as our evolutionary model. This model has fewer parameters than normal amino acid evolutionary model, with only transition/transversion rate ratio, nonsynonymous/synonymous rate ratio (omega = d_N/d_S), and codon usage. Similar to earlier work, we assume that omega is a random variable with different probabilities to take a set of discrete values. Those with omega>1 model sites under positive selection. We use the Bayesian Monte Carlo method to estimate model parameters, as it allows implementation of complex model of sequence evolution. Here unobserved DNA sequences are sampled from protein sequences based on distributions parametrized by codon usages, based on the fact that both protein sequences and the native protein-encoding DNA sequences have the same phylogenetic tree. The object is that sampled DNA sequences should fit the same phylogenetic tree as well as the native DNA sequences. Data set of beta-globin sequences from vertebrates is used to verify our model. We are able to detect all eight positive selection sites, which were originally reported using native nucleotide sequences. Our work shows that although nonsynonymous/synonymous rate ratio is defined at codon level, it can be used to detect selective pressures of amino acid sequences by our implicit codon-based mo- el.

Keywords

Bayes methods; DNA; Markov processes; Monte Carlo methods; biochemistry; biology computing; evolution (biological); genetics; molecular biophysics; proteins; Bayesian Monte Carlo method; DNA sequences; amino acid sequences; beta-globin sequences; evolutionary model; hidden parameters; implicit codon model; native nucleotide sequences; native protein-encoding DNA sequences; nonsynonymous-synonymous rate ratio; phylogenetic approach; phylogenetic tree; positive selection sites detection; protein sequences; reversible continuous time Markov process; transition-transversion rate ratio; vertebrates; Acceleration; Amino acids; Bayesian methods; DNA; Genetic mutations; Markov processes; Phylogeny; Proteins; Random variables; Sequences; Algorithms; Amino Acid Sequence; Codon; Computer Simulation; DNA Mutational Analysis; Evolution, Molecular; Globins; Models, Genetic; Molecular Sequence Data; Selection, Genetic; Sequence Analysis;

fLanguage

English

Publisher

ieee

Conference_Titel

Engineering in Medicine and Biology Society, 2007. EMBS 2007. 29th Annual International Conference of the IEEE

Conference_Location

Lyon

ISSN

1557-170X

Print_ISBN

978-1-4244-0787-3

Type

conf

DOI

10.1109/IEMBS.2007.4353538

Filename

4353538

Link To Document

https://search.isc.ac/dl/search/defaultta.aspx?DTC=49&DC=1850085