Title :
Search for evolution-related-oligonucleotides and conservative words in rRNA sequences
Author :
Luo, Liaofu ; Hsieh, Li-Ching ; Ji, Fengmin ; Jia, Mengwen ; Lee, H.C.
Author_Institution :
Phys. Dept., Inner Mongolia Univ., Hohhot, China
Abstract :
We describe a method for finding unmapped conserved words in rRNA sequences that is effective, utilizes evolutionary information and does not depend on multiple sequence alignment. Evolutionary distance (called n-distance) between a pair of 16S or 18S rRNA sequences is defined in terms of the difference in the two sets of frequencies of occurrence of oligonucleotides n bases long (n-mers) given by the sequences. These n-distances are used to reconstruct phylogenetic trees for 35 representative organisms from all three kingdoms. The quality of the tree generally improves with increasing n and reaches a plateau of best fit at n=7 or 8. Hence the 7-mer or 8-mer (oligonucleotide of 7 or 8 bases) frequencies provide a basis to describe rRNA evolution. Based on the analysis of the contribution of a particular 7-mers to 7-distances, a set of 612 7-mers (called evolution-related-oligonucleotides, EROs) that are critical to the topology of the best phylogenetic tree are identified. Expanding from this set of EROs, evolution-related conservative words longer than 7 bases in 16S rRNA sequences from an enlarged set of 98 organisms in bacteria and archaea are identified based on two criteria: 1) the word is highly conserved in nearly all species of a kingdom (or a sub-kingdom); and 2) the word is located at nearly the same site in each sequence. Three examples of words thus found are: the 13-mer ggattagataccc located at the end of a loop near H24 (in E.coli) is conservative in almost all species in archaea and bacteria. The 8-mer aacgagcg located on H35 is also conservative in archaea and bacteria. Its expansion, the 32-mer tgttgggttaagtcccgcaacgagcgcaaccc, is conservative in bacteria but not in archaea.
Keywords :
evolution (biological); genetic engineering; macromolecules; microorganisms; organic compounds; polymers; 16S rRNA sequences; 18S rRNA sequences; archaea; bacteria; evolution-related-oligonucleotides; evolutionary distance; phylogenetic trees; rRNA evolution; representative organisms; unmapped conservative words; Archaea; Biophysics; Constraint theory; Frequency; Microorganisms; Organisms; Phylogeny; Physics; Testing; Topology;
Conference_Titel :
Bioinformatics Conference, 2003. CSB 2003. Proceedings of the 2003 IEEE
Print_ISBN :
0-7695-2000-6
DOI :
10.1109/CSB.2003.1227375