DocumentCode :
3036684
Title :
Bi-Directional Context Modeling with Combinatorial Structuring for Genome Sequence Compression
Author :
Wenrui Dai ; Hongkai Xiong
Author_Institution :
Dept. of Electron. Eng., Shanghai Jiao Tong Univ., Shanghai, China
fYear :
2015
fDate :
7-9 April 2015
Firstpage :
442
Lastpage :
442
Abstract :
Summary form only given. This paper proposes a bi-directional context modeling (BCM) technique for reference-free genome sequence compression, which constructs its contexts by combining arbitrary predicted symbols in two directions corresponding to approximate repeats and non-repeat regions. Thus, BCM can sequentially predict DNA sequences with weighted conditional probabilities that simultaneously exploit the correlations among matched approximate repeats and fit the variable-order statistics in non-repeat regions. Moreover, BCM eliminates the overhead of pointer information for specifying approximate repeats, as it is synchronized in both encoder and decoder. In theory, we show that upper bounds of excess model redundancy led by BCM vanish with the growth of sequence size. Experimental results show that BCM outperforms the state-of-the-art reference-free compressors like FCM and CTW+LZ.
Keywords :
DNA; biology computing; genomics; probability; statistics; BCM technique; DNA sequence prediction; approximate repeats; bi-directional context modeling; combinatorial structuring; decoder; encoder; excess model redundancy; nonrepeat region; reference-free genome sequence compression; variable-order statistics; weighted conditional probabilities; Bidirectional control; Bioinformatics; Context; Context modeling; DNA; Encoding; Genomics;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Data Compression Conference (DCC), 2015
Conference_Location :
Snowbird, UT
ISSN :
1068-0314
Type :
conf
DOI :
10.1109/DCC.2015.67
Filename :
7149305
Link To Document :
بازگشت