• DocumentCode
    397264
  • Title

    Computing highly specific and mismatch tolerant oligomers efficiently

  • Author

    Yamada, Tomoyuki ; Morishita, Shinichi

  • Author_Institution
    Dept. of Computational Biol., Tokyo Univ., Japan
  • fYear
    2003
  • fDate
    11-14 Aug. 2003
  • Firstpage
    316
  • Lastpage
    325
  • Abstract
    The sequencing of the genomes of a variety of species and the growing databases containing expressed sequence tags (ESTs) and complementary DNAs (cDNAs) facilitate the design of highly specific oligomers for use as genomic markers, PCR primers, or DNA oligo microarrays. The first step in evaluating the specificity of short oligomers of about twenty units in length is to determine the frequencies at which the oligomers occur. However, for oligomers longer than about fifty units this is not efficient, as they usually have a frequency of only 1. A more suitable procedure is to consider the mismatch tolerance of an oligomer, that is, the minimum number of mismatches that allows a given oligomer to match a sub-sequence other than the target sequence anywhere in the genome or the EST database. However, calculating the exact value of mismatch tolerance is computationally costly and impractical. Therefore, we studied the problem of checking whether an oligomer meets the constraint that its mismatch tolerance is no less than a given threshold. Here, we present an efficient dynamic programming algorithm solution that utilizes suffix and height arrays. We demonstrated the effectiveness of this algorithm by efficiently computing a dense list of oligo-markers applicable to the human genome. Experimental results show that the algorithm runs faster than well-known Abrahamson ´s algorithm by orders of magnitude and is able to enumerate 63%∼79% of qualified oligomers.
  • Keywords
    DNA; arrays; biology computing; dynamic programming; genetic algorithms; genetics; polymers; Abrahamson ´s algorithm; DNA oligo microarrays; PCR primers; complementary DNA; dynamic programming algorithm solution; expressed sequence tags; genomes; genomic markers; mismatch tolerance; oligomers; Bioinformatics; Biology computing; DNA; Databases; Fluorescence; Frequency; Gene expression; Genomics; Humans; Sequences;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Bioinformatics Conference, 2003. CSB 2003. Proceedings of the 2003 IEEE
  • Print_ISBN
    0-7695-2000-6
  • Type

    conf

  • DOI
    10.1109/CSB.2003.1227332
  • Filename
    1227332