A Fast Duplicate Chunk Identifying Method Based on Hierarchical Indexing Structure

Author

Can Wang ; Zhi-guang Qin ; Lei Yang ; Juan Wang

Author_Institution

Sch. of Comput. Sci. & Eng., Univ. of Electron. Sci. & Technol. of China, Chengdu, China

fYear

2012

Firstpage

624

Lastpage

627

Abstract

To solve the disk bottleneck problem of deduplication system without depending on the data locality, a fast duplicate chunk identifying method based on hierarchical indexing structure is proposed. In this method, the traditional flat indexing structure is vertically divided into two layers, and only a handful of the most representative indices selected according to the Broder´s theorem are kept in the RAM. The experiment results on real data, which are lack of locality, indicate that the deduplication performance of this method can reach 87.05% of the optimal value with a far less RAM requirement than the current methods.

Keywords

indexing; random-access storage; Broder theorem; RAM; deduplication system; disk bottleneck problem; duplicate chunk identifying method; flat indexing structure; hierarchical indexing structure; representative indices; Educational institutions; Feature extraction; Indexing; Random access memory; Throughput; Writing; data locality; deduplication; disk bottleneck; hierarchical indexing structure;

fLanguage

English

Publisher

ieee

Conference_Titel

Industrial Control and Electronics Engineering (ICICEE), 2012 International Conference on

Conference_Location

Xi´an

Print_ISBN

978-1-4673-1450-3

Type

conf

DOI

10.1109/ICICEE.2012.169

Filename

6322458