DocumentCode
149686
Title
Exploring deep Markov models in genomic data compression using sequence pre-analysis
Author
Pratas, Diogo ; Pinho, Armando J.
Author_Institution
DETI/IEETA, Signal Process. Lab., Univ. of Aveiro, Aveiro, Portugal
fYear
2014
fDate
1-5 Sept. 2014
Firstpage
2395
Lastpage
2399
Abstract
The pressure to find efficient genomic compression algorithms is being felt worldwide, as proved by several prizes and competitions. In this paper, we propose a compression algorithm that relies on a pre-analysis of the data before compression, with the aim of identifying regions of low complexity. This strategy enables us to use deeper context models, supported by hash-tables, without requiring huge amounts of memory. As an example, context depths as large as 32 are attainable for alphabets of four symbols, as is the case of genomic sequences. These deeper context models show very high compression capabilities in very repetitive genomic sequences, yielding improvements over previous algorithms. Furthermore, this method is universal, in the sense that it can be used in any type of textual data (such as quality-scores).
Keywords
Markov processes; biology computing; data analysis; data compression; genomics; data sequence pre-analysis; deep Markov models; genomic data compression algorithm; hash-tables; low complexity regions; repetitive genomic sequences; textual data; Bioinformatics; Context; Context modeling; DNA; Data compression; Data models; Genomics; Genomic data compression; finite-context models; hash-tables;
fLanguage
English
Publisher
ieee
Conference_Titel
Signal Processing Conference (EUSIPCO), 2014 Proceedings of the 22nd European
Conference_Location
Lisbon
Type
conf
Filename
6952879
Link To Document