DocumentCode :
2397761
Title :
BloomFlash: Bloom Filter on Flash-Based Storage
Author :
Debnath, Biplob ; Sengupta, Sudipta ; Li, Jin ; Lilja, David J. ; Du, David H C
Author_Institution :
EMC Corp., Santa Clara, CA, USA
fYear :
2011
fDate :
20-24 June 2011
Firstpage :
635
Lastpage :
644
Abstract :
The bloom filter is a probabilistic data structure that provides a compact representation of a set of elements. To keep false positive probabilities low, the size of the bloom filter must be dimensioned a priori to be linear in the maximum number of keys inserted, with the linearity constant ranging typically from one to few bytes. A bloom filter is most commonly used as an in memory data structure, hence its size is limited by the availability of RAM space on the machine. As datasets have grown over time to Internet scale, so have the RAM space requirements of bloom filters. If sufficient RAM space is not available, we advocate that flash memory may serve as a suitable medium for storing bloom filters, since it is about one-tenth the cost of RAM per GB while still providing access times orders of magnitude faster than hard disk. We present BLOOMFLASH, a bloom filter designed for flash memory based storage, that provides a new dimension of trade off with bloom filter access times to reduce RAM space usage (and hence system cost). The simple design of a single flat bloom filter on flash suffers from many performance bottlenecks, including in-place bit updates that are inefficient on flash and multiple reads and random writes spread out across many flash pages for a single lookup or insert operation. To mitigate these performance bottlenecks, BLOOMFLASH leverages two key design innovations: (i) buffering bit updates in RAM and applying them in bulk to flash that helps to reduce random writes to flash, and (ii) a hierarchical bloom filter design consisting of component bloom filters, stored one per flash page, that helps to localize reads and writes on flash. We use two real-world data traces taken from representative bloom filter applications to drive and evaluate our design. BLOOMFLASH achieves bloom filter access times in the range of few tens of microseconds, thus allowing up to order of tens of thousands operations per sec.
Keywords :
data structures; flash memories; random-access storage; storage management; BloomFlash; Internet scale; RAM space; RAM space usage reduction; buffering bit updates; flash memory based storage; flash-based storage; hierarchical bloom filter design; memory data structure; probabilistic data structure; Ash; Buffer storage; Data structures; Games; Performance evaluation; Random access memory; Servers; Bloom Filter; Data Center Applications; Flash Memory; Hierarchical Design; Solid State Disk (SSD);
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Distributed Computing Systems (ICDCS), 2011 31st International Conference on
Conference_Location :
Minneapolis, MN
ISSN :
1063-6927
Print_ISBN :
978-1-61284-384-1
Electronic_ISBN :
1063-6927
Type :
conf
DOI :
10.1109/ICDCS.2011.44
Filename :
5961740
Link To Document :
بازگشت