• DocumentCode
    149686
  • Title

    Exploring deep Markov models in genomic data compression using sequence pre-analysis

  • Author

    Pratas, Diogo ; Pinho, Armando J.

  • Author_Institution
    DETI/IEETA, Signal Process. Lab., Univ. of Aveiro, Aveiro, Portugal
  • fYear
    2014
  • fDate
    1-5 Sept. 2014
  • Firstpage
    2395
  • Lastpage
    2399
  • Abstract
    The pressure to find efficient genomic compression algorithms is being felt worldwide, as proved by several prizes and competitions. In this paper, we propose a compression algorithm that relies on a pre-analysis of the data before compression, with the aim of identifying regions of low complexity. This strategy enables us to use deeper context models, supported by hash-tables, without requiring huge amounts of memory. As an example, context depths as large as 32 are attainable for alphabets of four symbols, as is the case of genomic sequences. These deeper context models show very high compression capabilities in very repetitive genomic sequences, yielding improvements over previous algorithms. Furthermore, this method is universal, in the sense that it can be used in any type of textual data (such as quality-scores).
  • Keywords
    Markov processes; biology computing; data analysis; data compression; genomics; data sequence pre-analysis; deep Markov models; genomic data compression algorithm; hash-tables; low complexity regions; repetitive genomic sequences; textual data; Bioinformatics; Context; Context modeling; DNA; Data compression; Data models; Genomics; Genomic data compression; finite-context models; hash-tables;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Signal Processing Conference (EUSIPCO), 2014 Proceedings of the 22nd European
  • Conference_Location
    Lisbon
  • Type

    conf

  • Filename
    6952879