• DocumentCode
    607786
  • Title

    A new PPM model for quality score compression

  • Author

    Akgun, Mete ; Sagiroglu, M.S.

  • Author_Institution
    Tubitak BILGEM, Gebze, Turkey
  • fYear
    2013
  • fDate
    24-26 April 2013
  • Firstpage
    1
  • Lastpage
    4
  • Abstract
    Next Generation Sequencing (NGS) platforms generate nucleotide sequences with header data and quality information. These platforms may produce gigabyte-scale datasets. The biggest problem of NGS technology is the storage of these datasets. Nucleotide sequences, supporting information and quality scores are stored in FASTQ format. In this paper, we consider the compression of quality scores and propose an algorithm for lossless compression of quality scores. We try to find a model that gives the lowest entropy on quality score data. We combine our powerful statistical model with arithmetic coding to compress the quality score data the smallest. We compare its performance to text compression utilities such as bzip2, gzip and ppmd and existing compression algorithms for quality scores. We show that the performance of our compression algorithm is superior to that of both systems.
  • Keywords
    arithmetic codes; bioinformatics; data compression; entropy; statistical analysis; FASTQ format; NGS technology; PPM model; arithmetic coding; gigabyte-scale datasets; header data; next generation sequencing platforms; nucleotide sequences; prediction by partial matching; quality information; quality score compression; quality score data; statistical model; text compression utility; Adaptation models; Bioinformatics; Compression algorithms; Data compression; Data models; Genomics; Sequential analysis; Compression; FASTQ; Prediction by Partial Matching (PPM); Quality Scores;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Signal Processing and Communications Applications Conference (SIU), 2013 21st
  • Conference_Location
    Haspolat
  • Print_ISBN
    978-1-4673-5562-9
  • Electronic_ISBN
    978-1-4673-5561-2
  • Type

    conf

  • DOI
    10.1109/SIU.2013.6531447
  • Filename
    6531447