Title :
Constructing Suffix Array During Decompression
Author :
Mahmoud, M. ; Abouelhoda, M.I. ; Kandil, A. ; Elbialy, A.
Author_Institution :
Fac. of Eng., Cairo Univ., Giza
Abstract :
The suffix array is an indexing data structure used in a wide range of applications in Bioinformatics. Biological DNA sequences are available to download from public servers in the form of compressed files, where the popular lossless compression program gzip [1] is employed. The straightforward method to construct the suffix array for this data involves decompressing the sequence file, storing it on disk, and then calling a suffix array construction program to build the suffix array. This scenario, albeit feasible, requires disk access and throws away valuable information in the compressed file. In this paper, we present an algorithm that constructs the suffix array during the decompression requiring no disk access and making use of the decompression information to construct the suffix array.
Keywords :
DNA; bioinformatics; data compression; data structures; bioinformatics; biological DNA sequences; compressed files; decompression; gzip; indexing data structure; lossless compression program; suffix array; Algorithm design and analysis; Bioinformatics; DNA; Data engineering; Data structures; File servers; Genomics; Indexing; Proteins; Sequences;
Conference_Titel :
Biomedical Engineering Conference, 2008. CIBEC 2008. Cairo International
Conference_Location :
Cairo
Print_ISBN :
978-1-4244-2694-2
Electronic_ISBN :
978-1-4244-2695-9
DOI :
10.1109/CIBEC.2008.4786040