DocumentCode
610384
Title
Efficient direct search on compressed genomic data
Author
Xiaochun Yang ; Bin Wang ; Chen Li ; Jiaying Wang ; Xiaohui Xie
Author_Institution
Coll. of Inf. Sci. & Eng., Northeastern Univ., Shenyang, China
fYear
2013
fDate
8-12 April 2013
Firstpage
961
Lastpage
972
Abstract
The explosive growth in the amount of data produced by next-generation sequencing poses significant computational challenges on how to store, transmit and query these data, efficiently and accurately. A unique characteristic of the genomic sequence data is that many of them can be highly similar to each other, which has motivated the idea of compressing sequence data by storing only their differences to a reference sequence, thereby drastically cutting the storage cost. However, an unresolved question in this area is whether it is possible to perform search directly on the compressed data, and if so, how. Here we show that directly querying compressed genomic sequence data is possible and can be done efficiently. We describe a set of novel index structures and algorithms for this purpose, and present several optimization techniques to reduce the space requirement and query response time. We demonstrate the advantage of our method and compare it against existing ones through a thorough experimental study on real genomic data.
Keywords
bioinformatics; data compression; genomics; indexing; query processing; compressed genomic data; data querying; data storage; data transmission; direct search; genomic sequence data; index structure; next-generation sequencing; optimization technique; query response time; sequence data compression; space requirement reduction; Bioinformatics; Genomics; Indexes; Niobium; Pattern matching; Sequential analysis; Silicon;
fLanguage
English
Publisher
ieee
Conference_Titel
Data Engineering (ICDE), 2013 IEEE 29th International Conference on
Conference_Location
Brisbane, QLD
ISSN
1063-6382
Print_ISBN
978-1-4673-4909-3
Electronic_ISBN
1063-6382
Type
conf
DOI
10.1109/ICDE.2013.6544889
Filename
6544889
Link To Document