DocumentCode
2518704
Title
Adaptive pipeline for deduplication
Author
Ma, Jingwei ; Zhao, Bin ; Wang, Gang ; Liu, Xiaoguang
Author_Institution
Coll. of I.T., Nankai Univ., Tianjin, China
fYear
2012
fDate
16-20 April 2012
Firstpage
1
Lastpage
6
Abstract
Deduplication has become one of the hottest topics in the field of data storage. Quite a few methods towards reducing disk I/O caused by deduplication have been proposed. Some methods also have been studied to accelerate computational sub-tasks in deduplication. However, the order of computational sub-tasks can affect overall deduplication throughput significantly, because computational sub-tasks exhibit quite different workload and concurrency in different orders and with different data sets. This paper proposes an adaptive pipelining model for the computational sub-tasks in deduplication. It takes both data type and hardware platform into account. Taking the compression ratio and the duplicate ratio of the data stream, and the compression speed and the fingerprinting speed on different processing units as parameters, it determines the optimal order of the pipeline stages (computational sub-tasks) and assigns each stage to the processing unit which processes it fastest. That is, “adaptive” refers to both data adaptive and hardware adaptive. Experimental results show that the adaptive pipeline improves the deduplication throughput up to 50% compared with the plain fixed pipeline, which implies that it is suitable for simultaneous deduplication of various data types on modern heterogeneous multi-core systems.
Keywords
data compression; data reduction; input-output programs; multiprocessing systems; pipeline processing; storage management; adaptive pipelining model; compression ratio; compression speed; computational subtasks; data adaptive; data sets; data storage; data stream; disk I/O reduction; duplicate ratio; fingerprinting speed; hardware adaptive; heterogeneous multicore systems; overall deduplication throughput; Adaptation models; Computational modeling; Graphics processing unit; Hardware; Pipeline processing; Pipelines; Throughput;
fLanguage
English
Publisher
ieee
Conference_Titel
Mass Storage Systems and Technologies (MSST), 2012 IEEE 28th Symposium on
Conference_Location
San Diego, CA
ISSN
2160-195X
Print_ISBN
978-1-4673-1745-0
Electronic_ISBN
2160-195X
Type
conf
DOI
10.1109/MSST.2012.6232377
Filename
6232377
Link To Document