• DocumentCode
    610384
  • Title

    Efficient direct search on compressed genomic data

  • Author

    Xiaochun Yang ; Bin Wang ; Chen Li ; Jiaying Wang ; Xiaohui Xie

  • Author_Institution
    Coll. of Inf. Sci. & Eng., Northeastern Univ., Shenyang, China
  • fYear
    2013
  • fDate
    8-12 April 2013
  • Firstpage
    961
  • Lastpage
    972
  • Abstract
    The explosive growth in the amount of data produced by next-generation sequencing poses significant computational challenges on how to store, transmit and query these data, efficiently and accurately. A unique characteristic of the genomic sequence data is that many of them can be highly similar to each other, which has motivated the idea of compressing sequence data by storing only their differences to a reference sequence, thereby drastically cutting the storage cost. However, an unresolved question in this area is whether it is possible to perform search directly on the compressed data, and if so, how. Here we show that directly querying compressed genomic sequence data is possible and can be done efficiently. We describe a set of novel index structures and algorithms for this purpose, and present several optimization techniques to reduce the space requirement and query response time. We demonstrate the advantage of our method and compare it against existing ones through a thorough experimental study on real genomic data.
  • Keywords
    bioinformatics; data compression; genomics; indexing; query processing; compressed genomic data; data querying; data storage; data transmission; direct search; genomic sequence data; index structure; next-generation sequencing; optimization technique; query response time; sequence data compression; space requirement reduction; Bioinformatics; Genomics; Indexes; Niobium; Pattern matching; Sequential analysis; Silicon;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Data Engineering (ICDE), 2013 IEEE 29th International Conference on
  • Conference_Location
    Brisbane, QLD
  • ISSN
    1063-6382
  • Print_ISBN
    978-1-4673-4909-3
  • Electronic_ISBN
    1063-6382
  • Type

    conf

  • DOI
    10.1109/ICDE.2013.6544889
  • Filename
    6544889