Title of article :
A code in the protein coding genes
Author/Authors :
Didier G. Arquès، نويسنده , , Christian J. Michel، نويسنده ,
Issue Information :
روزنامه با شماره پیاپی سال 1997
Pages :
28
From page :
107
To page :
134
Abstract :
A statistical analysis with 12 288 autocorrelation functions applied in protein (coding) genes of prokaryotes and eukaryotes identifies three subsets of trinucleotides in their three frames: T0=X0 {AAA, TTT} with X0={AAC, AAT, ACC, ATC, ATT, CAG, CTC, CTG, GAA, GAC, GAG, GAT, GCC, GGC, GGT, GTA, GTC, GTT, TAC, TTC} in frame 0 (the reading frame established by the ATG start trinucleotide), T1=X1 {CCC} in frame 1 and T2=X2 {GGG} in frame 2 (the frames 1 and 2 being the frame 0 shifted by one and two nucleotides, respectively, to the right). These three subsets are identical in these two gene populations and have five important properties: (i) the property of maximal (20 trinucleotides) circular code for X0 (resp. X1, X2) allowing to retrieve automatically the frame 0 (resp. 1, 2) in any region of the gene without start codon; (ii) the DNA complementarity property (e.g. (AAC)=GTT): (T0)=T0, (T1)=T2 and (T2)=T1 allowing the two paired reading frames of a DNA double helix simultaneously to code for amino acids; (iii) the circular permutation property (e.g. (AAC)=ACA): (X0)=X1 and (X1)=X2 implying that the two subsets X1 and X2 can be deduced from X0; (iv) the rarity property with an occurrence probability of X0=6×10−8; and (v) the concatenation properties in favour of an evolutionary code: a high frequency (27.5%) of misplaced trinucleotides in the shifted frames, a maximum (13 nucleotides) length of the minimal window to retrieve automatically the frame and an occurrence of the four types of nucleotides in the three trinucleotide sites. In Discussion, a simulation based on an independent mixing of the trinucleotides of T0 allows to retrieve the two subsets T1 and T2. Then, the identified subsets T0, T1 and T2 replaced in the 2-letter genetic alphabet {R, Y} (R=purine=A or G, Y=pyrimidine=C or T) allow to retrieve the RNY model (N=R or Y) and to explain previous works in the alphabet {R, Y}. Then, these three subsets are related to the genetic code. The trinucleotides of T0 code for 13 amino acids: Ala, Asn, Asp, Gln, Glu, Gly, Ile, Leu, Lys, Phe, Thr, Tyr and Val. Finally, a strong correlation between the usage of the trinucleotides of T0 in protein genes and the amino acid frequencies in proteins is observed as six among seven amino acids not coded by T0, have as expected the lowest frequencies in proteins of both prokaryotes and eukaryotes.
Journal title :
BioSystems
Serial Year :
1997
Journal title :
BioSystems
Record number :
497345
Link To Document :
بازگشت