• DocumentCode
    2380778
  • Title

    Algorithm for DNA sequence compression based on prediction of mismatch bases and repeat location

  • Author

    Kaipa, Kalyan Kumar ; Bopardikar, Ajit S. ; Abhilash, Srikantha ; Venkataraman, Parthasarathy ; Lee, Kyusang ; Ahn, TaeJin ; Narayanan, Rangavittal

  • Author_Institution
    SAIT India Lab., Samsung, Bangalore, India
  • fYear
    2010
  • fDate
    18-18 Dec. 2010
  • Firstpage
    851
  • Lastpage
    852
  • Abstract
    For DNA sequence Compression, it has been observed that methods based on Markov modeling and repeats give best results. However, these methods tend to use uniform distribution assumption of mismatches for approximate repeats. We show that these replacements are not uniformly distributed and we can improve compression efficiency by using non uniform distribution for mismatches. We also propose a hash table based method to predict repeat location which works well for block based genomic sequence compression algorithms. The proposed methods give good compression gains. The method can be incorporated into any algorithm that uses approximate repeats to realize similar gains.
  • Keywords
    DNA; Markov processes; bioinformatics; data compression; molecular biophysics; molecular configurations; DNA sequence compression algorithm; block based genomic sequence compression algorithms; compression efficiency; hash table based method; mismatch base prediction; mismatch prediction; nonuniform mismatch distribution; repeat location prediction;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Bioinformatics and Biomedicine Workshops (BIBMW), 2010 IEEE International Conference on
  • Conference_Location
    Hong, Kong
  • Print_ISBN
    978-1-4244-8303-7
  • Electronic_ISBN
    978-1-4244-8304-4
  • Type

    conf

  • DOI
    10.1109/BIBMW.2010.5703941
  • Filename
    5703941