DocumentCode :
1850085
Title :
Detecting Positively Selected Sites From Amino Acid Sequences: An Implicit Codon Model
Author :
Zheng Ouyang ; Jie Liang
Author_Institution :
Univ. of Illinois at Chicago, Chicago
fYear :
2007
fDate :
22-26 Aug. 2007
Firstpage :
5302
Lastpage :
5306
Abstract :
Fixation of advantageous mutations is an important evolutionary force driving the accelerated protein diversification. However, the standard phylogenetic approach to infer positive selection is based on relative rate of nonsynonymous to synonymous substitutions, and requires the knowledge of DNA sequences, hence precludes its application to family of remotely related sequences where saturated substitution occur. In this study, we develop a new method to detect positive selection directly from amino acid sequences by treating codon usage as hidden parameters. For a given amino acid sequence set and a phylogenetic tree, we use a reversible continuous time Markov process as our evolutionary model. This model has fewer parameters than normal amino acid evolutionary model, with only transition/transversion rate ratio, nonsynonymous/synonymous rate ratio (omega = dN/dS), and codon usage. Similar to earlier work, we assume that omega is a random variable with different probabilities to take a set of discrete values. Those with omega>1 model sites under positive selection. We use the Bayesian Monte Carlo method to estimate model parameters, as it allows implementation of complex model of sequence evolution. Here unobserved DNA sequences are sampled from protein sequences based on distributions parametrized by codon usages, based on the fact that both protein sequences and the native protein-encoding DNA sequences have the same phylogenetic tree. The object is that sampled DNA sequences should fit the same phylogenetic tree as well as the native DNA sequences. Data set of beta-globin sequences from vertebrates is used to verify our model. We are able to detect all eight positive selection sites, which were originally reported using native nucleotide sequences. Our work shows that although nonsynonymous/synonymous rate ratio is defined at codon level, it can be used to detect selective pressures of amino acid sequences by our implicit codon-based mo- el.
Keywords :
Bayes methods; DNA; Markov processes; Monte Carlo methods; biochemistry; biology computing; evolution (biological); genetics; molecular biophysics; proteins; Bayesian Monte Carlo method; DNA sequences; amino acid sequences; beta-globin sequences; evolutionary model; hidden parameters; implicit codon model; native nucleotide sequences; native protein-encoding DNA sequences; nonsynonymous-synonymous rate ratio; phylogenetic approach; phylogenetic tree; positive selection sites detection; protein sequences; reversible continuous time Markov process; transition-transversion rate ratio; vertebrates; Acceleration; Amino acids; Bayesian methods; DNA; Genetic mutations; Markov processes; Phylogeny; Proteins; Random variables; Sequences; Algorithms; Amino Acid Sequence; Codon; Computer Simulation; DNA Mutational Analysis; Evolution, Molecular; Globins; Models, Genetic; Molecular Sequence Data; Selection, Genetic; Sequence Analysis;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Engineering in Medicine and Biology Society, 2007. EMBS 2007. 29th Annual International Conference of the IEEE
Conference_Location :
Lyon
ISSN :
1557-170X
Print_ISBN :
978-1-4244-0787-3
Type :
conf
DOI :
10.1109/IEMBS.2007.4353538
Filename :
4353538
Link To Document :
بازگشت