مرکز منطقه ای اطلاع رساني علوم و فناوري - High-Throughput Compression of FASTQ Data with SeqDB

DocumentCode :

1755731

Title :

High-Throughput Compression of FASTQ Data with SeqDB

Author :

Howison, Mark

Author_Institution :

Center for Comput. & Visualization, Brown Univ., Providence, RI, USA

Volume :

Issue :

fYear :

2013

fDate :

Jan.-Feb. 2013

Firstpage :

213

Lastpage :

218

Abstract :

Compression has become a critical step in storing next-generation sequencing (NGS) data sets because of both the increasing size and decreasing costs of such data. Recent research into efficiently compressing sequence data has focused largely on improving compression ratios. Yet, the throughputs of current methods now lag far behind the I/O bandwidths of modern storage systems. As biologists move their analyses to high-performance systems with greater I/O bandwidth, low-throughput compression becomes a limiting factor. To address this gap, we present a new storage model called SeqDB, which offers high-throughput compression of sequence data with minimal sacrifice in compression ratio. It achieves this by combining the existing multithreaded Blosc compressor with a new data-parallel byte-packing scheme, called SeqPack, which interleaves sequence data and quality scores.

Keywords :

DNA; biology computing; data compression; interleaved codes; FASTQ; I/O bandwidths; SeqDB; SeqPack; data compression; multithreaded Blosc compressor; next-generation sequencing; quality scores; sequence data interleaving; Arrays; Bandwidth; Bioinformatics; Genomics; Instruction sets; Libraries; Throughput; Compression; FASTQ; data storage; next-generation sequencing; Computational Biology; Data Compression; Databases, Genetic; High-Throughput Nucleotide Sequencing; Sequence Analysis, DNA;

fLanguage :

English

Journal_Title :

Computational Biology and Bioinformatics, IEEE/ACM Transactions on

Publisher :

ieee

ISSN :

1545-5963

Type :

jour

DOI :

10.1109/TCBB.2012.160

Filename :

6378359

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=1755731