• DocumentCode
    2518704
  • Title

    Adaptive pipeline for deduplication

  • Author

    Ma, Jingwei ; Zhao, Bin ; Wang, Gang ; Liu, Xiaoguang

  • Author_Institution
    Coll. of I.T., Nankai Univ., Tianjin, China
  • fYear
    2012
  • fDate
    16-20 April 2012
  • Firstpage
    1
  • Lastpage
    6
  • Abstract
    Deduplication has become one of the hottest topics in the field of data storage. Quite a few methods towards reducing disk I/O caused by deduplication have been proposed. Some methods also have been studied to accelerate computational sub-tasks in deduplication. However, the order of computational sub-tasks can affect overall deduplication throughput significantly, because computational sub-tasks exhibit quite different workload and concurrency in different orders and with different data sets. This paper proposes an adaptive pipelining model for the computational sub-tasks in deduplication. It takes both data type and hardware platform into account. Taking the compression ratio and the duplicate ratio of the data stream, and the compression speed and the fingerprinting speed on different processing units as parameters, it determines the optimal order of the pipeline stages (computational sub-tasks) and assigns each stage to the processing unit which processes it fastest. That is, “adaptive” refers to both data adaptive and hardware adaptive. Experimental results show that the adaptive pipeline improves the deduplication throughput up to 50% compared with the plain fixed pipeline, which implies that it is suitable for simultaneous deduplication of various data types on modern heterogeneous multi-core systems.
  • Keywords
    data compression; data reduction; input-output programs; multiprocessing systems; pipeline processing; storage management; adaptive pipelining model; compression ratio; compression speed; computational subtasks; data adaptive; data sets; data storage; data stream; disk I/O reduction; duplicate ratio; fingerprinting speed; hardware adaptive; heterogeneous multicore systems; overall deduplication throughput; Adaptation models; Computational modeling; Graphics processing unit; Hardware; Pipeline processing; Pipelines; Throughput;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Mass Storage Systems and Technologies (MSST), 2012 IEEE 28th Symposium on
  • Conference_Location
    San Diego, CA
  • ISSN
    2160-195X
  • Print_ISBN
    978-1-4673-1745-0
  • Electronic_ISBN
    2160-195X
  • Type

    conf

  • DOI
    10.1109/MSST.2012.6232377
  • Filename
    6232377