DocumentCode :
3601413
Title :
Compression of Multiple DNA Sequences Using Intra-Sequence and Inter-Sequence Similarities
Author :
Kin-On Cheng ; Wu, Paula ; Ngai-Fong Law ; Wan-Chi Siu
Author_Institution :
Dept. of Electron. & Inf. Eng., Hong Kong Polytech. Univ., Hong Kong, China
Volume :
12
Issue :
6
fYear :
2015
Firstpage :
1322
Lastpage :
1332
Abstract :
Traditionally, intra-sequence similarity is exploited for compressing a single DNA sequence. Recently, remarkable compression performance of individual DNA sequence from the same population is achieved by encoding its difference with a nearly identical reference sequence. Nevertheless, there is lack of general algorithms that also allow less similar reference sequences. In this work, we extend the intra-sequence to the inter-sequence similarity in that approximate matches of subsequences are found between the DNA sequence and a set of reference sequences. Hence, a set of nearly identical DNA sequences from the same population or a set of partially similar DNA sequences like chromosome sequences and DNA sequences of related species can be compressed together. For practical compressors, the compressed size is usually influenced by the compression order of sequences. Fast search algorithms for the optimal compression order are thus developed for multiple sequences compression. Experimental results on artificial and real datasets demonstrate that our proposed multiple sequences compression methods with fast compression order search are able to achieve good compression performance under different levels of similarity in the multiple DNA sequences.
Keywords :
DNA; biology computing; data compression; molecular biophysics; molecular configurations; artificial datasets; chromosome sequences; compression order; intersequence similarity; intrasequence similarity; multiple DNA sequence compression; real datasets; Approximation algorithms; Bioinformatics; Complexity theory; Computational biology; DNA; Encoding; Biology and genetics; data compaction and compression; data dependencies; information theory;
fLanguage :
English
Journal_Title :
Computational Biology and Bioinformatics, IEEE/ACM Transactions on
Publisher :
ieee
ISSN :
1545-5963
Type :
jour
DOI :
10.1109/TCBB.2015.2403370
Filename :
7047709
Link To Document :
بازگشت