Title :
Research on a Clustering Data De-Duplication Mechanism Based on Bloom Filter
Author :
Wang, Guohua ; Zhao, Yuelong ; Xie, Xiaoling ; Liu, Lin
Author_Institution :
Sch. of Software Eng., South China Univ. of Technol., Guangzhou, China
Abstract :
Recently, data de-duplication, the hot emerging technology, has received a broad attention from both academia and industry. Some researches focus on the approach by which to reduce more redundant data. And the others investigate how to do de-duplication at high speed. In this paper, we aim at reducing the time and space requirement for data de-duplication. We describe a clustering architecture with multiple nodes and all nodes can do the chunk-level data de-duplication in parallel. Thus the performance will be improved noticeably. At the same time, this paper proposes a new technique called "Fingerprint Summary". Each node keeps a compact summary of the chunks\´ fingerprints of every other node in its memory. When checking for duplicate chunks, each node queries its local chunk hash database and then the Fingerprint Summary if necessary to eliminate inter-node redundant chunks. So we can reduce the storage capacity requirement largely.
Keywords :
data compression; data handling; filtering theory; pattern clustering; storage management; Bloom filter; chunk hash database; chunk level data deduplication; clustering architecture; clustering data deduplication mechanism; fingerprint summary; internode redundant chunk; time space requirement reduction; Arrays; Databases; Fingerprint recognition; Protocols; Redundancy; Space technology;
Conference_Titel :
Multimedia Technology (ICMT), 2010 International Conference on
Conference_Location :
Ningbo
Print_ISBN :
978-1-4244-7871-2
DOI :
10.1109/ICMULT.2010.5630395