Title :
Efficient direct search on compressed genomic data
Author :
Xiaochun Yang ; Bin Wang ; Chen Li ; Jiaying Wang ; Xiaohui Xie
Author_Institution :
Coll. of Inf. Sci. & Eng., Northeastern Univ., Shenyang, China
Abstract :
The explosive growth in the amount of data produced by next-generation sequencing poses significant computational challenges on how to store, transmit and query these data, efficiently and accurately. A unique characteristic of the genomic sequence data is that many of them can be highly similar to each other, which has motivated the idea of compressing sequence data by storing only their differences to a reference sequence, thereby drastically cutting the storage cost. However, an unresolved question in this area is whether it is possible to perform search directly on the compressed data, and if so, how. Here we show that directly querying compressed genomic sequence data is possible and can be done efficiently. We describe a set of novel index structures and algorithms for this purpose, and present several optimization techniques to reduce the space requirement and query response time. We demonstrate the advantage of our method and compare it against existing ones through a thorough experimental study on real genomic data.
Keywords :
bioinformatics; data compression; genomics; indexing; query processing; compressed genomic data; data querying; data storage; data transmission; direct search; genomic sequence data; index structure; next-generation sequencing; optimization technique; query response time; sequence data compression; space requirement reduction; Bioinformatics; Genomics; Indexes; Niobium; Pattern matching; Sequential analysis; Silicon;
Conference_Titel :
Data Engineering (ICDE), 2013 IEEE 29th International Conference on
Conference_Location :
Brisbane, QLD
Print_ISBN :
978-1-4673-4909-3
Electronic_ISBN :
1063-6382
DOI :
10.1109/ICDE.2013.6544889