Title :
Detecting Duplicates over Sliding Windows with RAM-Efficient Detached Counting Bloom Filter Arrays
Author :
Wei, Jiansheng ; Jiang, Hong ; Zhou, Ke ; Feng, Dan ; Wang, Hua
Author_Institution :
Wuhan Nat. Lab. for Optoelectron., Huazhong Univ. of Sci. & Technol., Wuhan, China
Abstract :
Detecting duplicates over sliding windows is an important technique for monitoring and analysing data streams. Since recording the exact information of elements in a sliding window can be RAM-resource-intensive and introduce an unacceptable search complexity, several approximate membership representation schemes have been proposed to build in-memory fast indices. However, various challenges facing RAM utilization and scalability remain. This paper proposes a Detached Counting Bloom filter Array (DCBA) to flexibly and efficiently detect duplicates over sliding windows. A DCBA consists of an array of detached counting Bloom filters (DCBFs), where each DCBF is essentially a Bloom filter that is associated with a detached timer (counter) array. The DCBA scheme functions as a circular FIFO queue and keeps a filling DCBF for accommodating fresh elements and a decaying DCBF for evicting stale elements. DCBA allows the timer arrays belonging to fully filled DCBFs to be offloaded to disks to greatly improve the memory space efficiency. The fully filled DCBFs will remain stable until their elements become stale, which allows a DCBA to be efficiently replicated for the purpose of data reliability or information sharing. Further, DCBA can be cooperatively maintained by clustered nodes, which provides scalable solution for mining massive data streams. Mathematical analysis and experimental results show that a DCBA (containing 64 DCBFs) requires less than 10% of its components to be kept in RAM while maintaining more than 95% of its query performance, which significantly outperforms existing schemes in memory efficiency and scalability.
Keywords :
data analysis; data mining; data structures; mathematical analysis; queueing theory; random-access storage; RAM scalability; RAM utilization; RAM-efficient detached counting Bloom filter arrays; circular FIFO queue; clustered nodes; counter array; data reliability; data stream analysis; data stream monitoring; data structure; detached timer array; duplicate detection; in-memory fast index; information sharing; massive data stream mining; mathematical analysis; membership representation; memory efficiency; memory space efficiency; query performance; search complexity; sliding windows; Arrays; Filling; Information filters; Monitoring; Radiation detectors; Random access memory; Timing; Bloom filter; data stream; duplicate detection; sliding window;
Conference_Titel :
Networking, Architecture and Storage (NAS), 2011 6th IEEE International Conference on
Conference_Location :
Dalian, Liaoning
Print_ISBN :
978-1-4577-1172-5
Electronic_ISBN :
978-0-7695-4509-7
DOI :
10.1109/NAS.2011.37