DocumentCode :
2762482
Title :
Constructing Suffix Array During Decompression
Author :
Mahmoud, M. ; Abouelhoda, M.I. ; Kandil, A. ; Elbialy, A.
Author_Institution :
Fac. of Eng., Cairo Univ., Giza
fYear :
2008
fDate :
18-20 Dec. 2008
Firstpage :
1
Lastpage :
4
Abstract :
The suffix array is an indexing data structure used in a wide range of applications in Bioinformatics. Biological DNA sequences are available to download from public servers in the form of compressed files, where the popular lossless compression program gzip [1] is employed. The straightforward method to construct the suffix array for this data involves decompressing the sequence file, storing it on disk, and then calling a suffix array construction program to build the suffix array. This scenario, albeit feasible, requires disk access and throws away valuable information in the compressed file. In this paper, we present an algorithm that constructs the suffix array during the decompression requiring no disk access and making use of the decompression information to construct the suffix array.
Keywords :
DNA; bioinformatics; data compression; data structures; bioinformatics; biological DNA sequences; compressed files; decompression; gzip; indexing data structure; lossless compression program; suffix array; Algorithm design and analysis; Bioinformatics; DNA; Data engineering; Data structures; File servers; Genomics; Indexing; Proteins; Sequences;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Biomedical Engineering Conference, 2008. CIBEC 2008. Cairo International
Conference_Location :
Cairo
Print_ISBN :
978-1-4244-2694-2
Electronic_ISBN :
978-1-4244-2695-9
Type :
conf
DOI :
10.1109/CIBEC.2008.4786040
Filename :
4786040
Link To Document :
بازگشت