DocumentCode :
1784749
Title :
Efficient algorithms for the compression of FASTQ files
Author :
Saha, Simanto ; Rajasekaran, Sanguthevar
Author_Institution :
Dept. of Comput. Sci. & Eng., Univ. of Connecticut, Storrs, CT, USA
fYear :
2014
fDate :
2-5 Nov. 2014
Firstpage :
82
Lastpage :
85
Abstract :
Since the introduction of the Sanger sequencing technology in 1977 by Frederic Sanger and his colleagues, we observe an explosion of sequence data. The cost of storage, processing, and analyzing the data is getting excessively high. As a result, it is extremely important that we develop efficient data compression and data reduction techniques. But standard data compression tools are not suitable to compress biological data since they contain many repetitive regions. There could exist high similarities among the sequences. In this context we need specialized algorithms to effectively compress biological data. In this paper we propose novel algorithms for compressing FASTQ files. We have done extensive and rigorous experiments that reveal that our proposed algorithm is indeed competitive and performs better than the best known algorithms for this problem.
Keywords :
bioinformatics; data compression; data reduction; sequences; FASTQ files; Sanger sequencing technology; biological data; cost-of-storage; data analysis; data processing; data reduction techniques; efficient compression algorithms; sequence data explosion; specialized algorithms; standard data compression tools; Bioinformatics; Clustering algorithms; Compression algorithms; Data compression; Encoding; Genomics; Sequential analysis;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Bioinformatics and Biomedicine (BIBM), 2014 IEEE International Conference on
Conference_Location :
Belfast
Type :
conf
DOI :
10.1109/BIBM.2014.6999132
Filename :
6999132
Link To Document :
بازگشت