• DocumentCode
    610063
  • Title

    An Adaptive Difference Distribution-Based Coding with Hierarchical Tree Structure for DNA Sequence Compression

  • Author

    Wenrui Dai ; Hongkai Xiong ; Xiaoqian Jiang ; Ohno-Machado, L.

  • Author_Institution
    Dept. of Electron. Eng., Shanghai Jiaotong Univ., Shanghai, China
  • fYear
    2013
  • fDate
    20-22 March 2013
  • Firstpage
    371
  • Lastpage
    380
  • Abstract
    Previous reference-based compression on DNA sequences do not fully exploit the intrinsic statistics by merely concerning the approximate matches. In this paper, an adaptive difference distribution-based coding framework is proposed by the fragments of nucleotides with a hierarchical tree structure. To keep the distribution of difference sequence from the reference and target sequences concentrated, the sub-fragment size and matching offset for predicting are flexible to the stepped size structure. The matching with approximate repeats in reference will be imposed with the Hamming-like weighted distance measure function in a local region closed to the current fragment, such that the accuracy of matching and the overhead of describing matching offset can be balanced. A well-designed coding scheme will make compact both the difference sequence and the additional parameters, e.g. sub-fragment size and matching offset. Experimental results show that the proposed scheme achieves 150% compression improvement in comparison with the best reference-based compressor GReEn.
  • Keywords
    DNA; Hamming codes; adaptive codes; biology computing; data compression; molecular biophysics; pattern matching; tree data structures; DNA sequence compression; Hamming-like weighted distance measure function; adaptive difference distribution-based coding; approximate matching offset; hierarchical tree structure; nucleotides; stepped size structure; sub-fragment size offset; Bioinformatics; Biological cells; DNA; Distance measurement; Encoding; Genomics; Sequential analysis;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Data Compression Conference (DCC), 2013
  • Conference_Location
    Snowbird, UT
  • ISSN
    1068-0314
  • Print_ISBN
    978-1-4673-6037-1
  • Type

    conf

  • DOI
    10.1109/DCC.2013.45
  • Filename
    6543073