• DocumentCode
    2833074
  • Title

    Analysis of data fragments in deduplication system

  • Author

    Zhang, Zhike ; Jiang, ZeJun ; Peng, Chengzhang ; Liu, Zhiqiang

  • Author_Institution
    Sch. of Comput. Sci. & Technol., Northwestern Polytech. Univ., Xi´´an, China
  • fYear
    2012
  • fDate
    June 30 2012-July 2 2012
  • Firstpage
    559
  • Lastpage
    563
  • Abstract
    To maximize the writing throughput of the deduplication system, most deduplication systems and deduplication clusters sequentially store new chunks in disk. This method results in data fragments as the deduplication system grows. It is important to analyse the data fragments in the deduplication system and to understand its features. We analyse the features of data fragments in deduplication system using three datasets from real world. We utilize File Fragment Degree (FFD) to quantize the data fragments of a file in deduplication system. We firstly implement Extreme Binning (EB) to collect the chunk addresses of every file in the dataset. Then, we design a FFD analyser to compute FFD for every file according to its chunk addresses and sizes. Finally, we analyse the FFD numbers. As far as we know, this is the first research on the analysis of data fragments in deduplication system. Our findings show that: 1) there are a large mount of data fragments in deduplication system for various datasets; 2) for enterprise backup data, the amount of data fragments increases rapidly as the deduplication system grows; 3) for dataset mainly containing small files, the amount of data fragments increases slowly as the deduplication system grows.
  • Keywords
    back-up procedures; data analysis; storage management; FFD analyser; FFD number; chunk address collection; data fragment analysis; data fragment feature analysis; data fragment quantization; deduplication clusters; deduplication system; enterprise backup data; extreme binning; file fragment degree; small files; writing throughput maximization; Algorithm design and analysis; Containers; Educational institutions; Linux; Throughput; USA Councils; Writing; Deduplication; backup data; data fragmentation;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    System Science and Engineering (ICSSE), 2012 International Conference on
  • Conference_Location
    Dalian, Liaoning
  • Print_ISBN
    978-1-4673-0944-8
  • Electronic_ISBN
    978-1-4673-0943-1
  • Type

    conf

  • DOI
    10.1109/ICSSE.2012.6257249
  • Filename
    6257249