Title :
Computing highly specific and mismatch tolerant oligomers efficiently
Author :
Yamada, Tomoyuki ; Morishita, Shinichi
Author_Institution :
Dept. of Computational Biol., Tokyo Univ., Japan
Abstract :
The sequencing of the genomes of a variety of species and the growing databases containing expressed sequence tags (ESTs) and complementary DNAs (cDNAs) facilitate the design of highly specific oligomers for use as genomic markers, PCR primers, or DNA oligo microarrays. The first step in evaluating the specificity of short oligomers of about twenty units in length is to determine the frequencies at which the oligomers occur. However, for oligomers longer than about fifty units this is not efficient, as they usually have a frequency of only 1. A more suitable procedure is to consider the mismatch tolerance of an oligomer, that is, the minimum number of mismatches that allows a given oligomer to match a sub-sequence other than the target sequence anywhere in the genome or the EST database. However, calculating the exact value of mismatch tolerance is computationally costly and impractical. Therefore, we studied the problem of checking whether an oligomer meets the constraint that its mismatch tolerance is no less than a given threshold. Here, we present an efficient dynamic programming algorithm solution that utilizes suffix and height arrays. We demonstrated the effectiveness of this algorithm by efficiently computing a dense list of oligo-markers applicable to the human genome. Experimental results show that the algorithm runs faster than well-known Abrahamson ´s algorithm by orders of magnitude and is able to enumerate 63%∼79% of qualified oligomers.
Keywords :
DNA; arrays; biology computing; dynamic programming; genetic algorithms; genetics; polymers; Abrahamson ´s algorithm; DNA oligo microarrays; PCR primers; complementary DNA; dynamic programming algorithm solution; expressed sequence tags; genomes; genomic markers; mismatch tolerance; oligomers; Bioinformatics; Biology computing; DNA; Databases; Fluorescence; Frequency; Gene expression; Genomics; Humans; Sequences;
Conference_Titel :
Bioinformatics Conference, 2003. CSB 2003. Proceedings of the 2003 IEEE
Print_ISBN :
0-7695-2000-6
DOI :
10.1109/CSB.2003.1227332