• DocumentCode
    656169
  • Title

    Hysteresis Re-chunking Based Metadata Harnessing Deduplication of Disk Images

  • Author

    Bing Zhou ; Jiangtao Wen

  • Author_Institution
    Dept. of Comput. Sci. & Technol., Tsinghua Univ., Beijing, China
  • fYear
    2013
  • fDate
    1-4 Oct. 2013
  • Firstpage
    389
  • Lastpage
    398
  • Abstract
    Metadata-related overhead can significantly impact the performance of data deduplication systems, including the real duplication elimination ratio and the deduplication throughput. The amount of metadata produced is mainly determined by the chunking mechanism for the input data stream. In this paper, we propose a metadata harnessing deduplication (MHD) algorithm utilizing a duplication-distribution-based hysteresis re-chunking strategy. MHD harnesses the metadata by dynamically merging multiple non-duplicate chunks into one big chunk represented by one hash value while dividing big chunks straddling duplicate and non-duplicate data regions into small chunks represented with multiple hashes. Experimental results show that the proposed algorithm achieves a lower metadata overhead and a higher deduplication throughput for a given duplication elimination ratio, as compared with other state-of-the-art algorithms such as the Bimodal, Sub Chunk and Sparse Indexing algorithms.
  • Keywords
    meta data; storage management; MHD algorithm; bimodal algorithm; data storage system; deduplication throughput; disk images; duplication-distribution-based hysteresis re-chunking strategy; hysteresis re-chunking based metadata harnessing deduplication system; input data stream; metadata-related overhead; real duplication elimination ratio; sparse indexing algorithms; subchunk algorithm; Algorithm design and analysis; Hysteresis; Indexes; Magnetohydrodynamics; Merging; Random access memory; Throughput; Data Deduplication; Metadata Harnessing;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Parallel Processing (ICPP), 2013 42nd International Conference on
  • Conference_Location
    Lyon
  • ISSN
    0190-3918
  • Type

    conf

  • DOI
    10.1109/ICPP.2013.48
  • Filename
    6687372