• DocumentCode
    1610954
  • Title

    A Fast Duplicate Chunk Identifying Method Based on Hierarchical Indexing Structure

  • Author

    Can Wang ; Zhi-guang Qin ; Lei Yang ; Juan Wang

  • Author_Institution
    Sch. of Comput. Sci. & Eng., Univ. of Electron. Sci. & Technol. of China, Chengdu, China
  • fYear
    2012
  • Firstpage
    624
  • Lastpage
    627
  • Abstract
    To solve the disk bottleneck problem of deduplication system without depending on the data locality, a fast duplicate chunk identifying method based on hierarchical indexing structure is proposed. In this method, the traditional flat indexing structure is vertically divided into two layers, and only a handful of the most representative indices selected according to the Broder´s theorem are kept in the RAM. The experiment results on real data, which are lack of locality, indicate that the deduplication performance of this method can reach 87.05% of the optimal value with a far less RAM requirement than the current methods.
  • Keywords
    indexing; random-access storage; Broder theorem; RAM; deduplication system; disk bottleneck problem; duplicate chunk identifying method; flat indexing structure; hierarchical indexing structure; representative indices; Educational institutions; Feature extraction; Indexing; Random access memory; Throughput; Writing; data locality; deduplication; disk bottleneck; hierarchical indexing structure;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Industrial Control and Electronics Engineering (ICICEE), 2012 International Conference on
  • Conference_Location
    Xi´an
  • Print_ISBN
    978-1-4673-1450-3
  • Type

    conf

  • DOI
    10.1109/ICICEE.2012.169
  • Filename
    6322458