• DocumentCode
    74548
  • Title

    Efficiently Representing Membershipfor Variable Large Data Sets

  • Author

    Jiansheng Wei ; Hong Jiang ; Ke Zhou ; Dan Feng

  • Author_Institution
    Sch. of Comput. Sci. & Technol., Huazhong Univ. of Sci. & Technol., Wuhan, China
  • Volume
    25
  • Issue
    4
  • fYear
    2014
  • fDate
    Apr-14
  • Firstpage
    960
  • Lastpage
    970
  • Abstract
    Cloud computing has raised new challenges for the membership representation scheme of storage systems that manage very large data sets. This paper proposes DBA, a dynamic Bloom filter array aimed at representing membership for variable large data sets in storage systems in a scalable way. DBA consists of dynamically created groups of space-efficient Bloom filters (BFs) to accommodate changes in set sizes. Within a group, BFs are homogeneous and the data layout is optimized at the bit level to enable parallel access and thus achieve high query performance. DBA can effectively control its query accuracy by partially adjusting the error rate of the constructing BFs, where each BF only represents an independent subset to help locate elements and confirm membership. Further, DBA supports element deletion by introducing a lazy update policy. We prototype and evaluate our DBA scheme as a scalable fast index in the MAD2 deduplication storage system. Experimental results reveal that DBA (with 64 BFs per group) shows significantly higher query performance than the state-of-the-art approach while scaling up to 160 BFs. DBA is also shown to excel in scalability, query accuracy, and space efficiency by theoretical analysis and experimental evaluation.
  • Keywords
    cloud computing; data handling; data structures; query processing; BF; MAD2 deduplication storage system; cloud computing; data layout; dynamic Bloom filter; membership representation scheme; query accuracy; query performance; storage systems; variable large data sets; Arrays; Distributed databases; Error analysis; Indexes; Peer-to-peer computing; Random access memory; Servers; Bloom filter; Data management; fast index; membership representation;
  • fLanguage
    English
  • Journal_Title
    Parallel and Distributed Systems, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    1045-9219
  • Type

    jour

  • DOI
    10.1109/TPDS.2013.66
  • Filename
    6471979