DocumentCode :
3243882
Title :
A Novel Optimization Method to Improve De-duplication Storage System Performance
Author :
Liu, Chuanyi ; Xue, Yibo ; Ju, Dapeng ; Wang, Dongsheng
Author_Institution :
Dept. of Comput. Sci. & Technol., Tsinghua Univ., Beijing, China
fYear :
2009
fDate :
8-11 Dec. 2009
Firstpage :
228
Lastpage :
235
Abstract :
Data De-duplication has become a commodity component in data-intensive storage systems. But compared with other traditional storage paradigms, de-duplication system achieves elimination of data duplications or redundancies at the cost of bringing several additional layers or function components into the I/O path, and these additional components are either CPU-intensive or I/O intensive, largely hindering the overall system performance. Direct against the above potential system bottlenecks, this paper quantitatively analyzes the overhead of each main component introduced by de-duplication, and then proposes two performance optimization methods. The one is parallel calculation of content aware chunk identifiers, which fully utilizes the parallelism both inter and intra chunks by using a certain task partition and chunk content distribution algorithm. Experiments demonstrate that it can improve up to 150% of the system throughput, and at the same time much better utilize the multiprocessor resources. The other one is storage pipelining, which overlaps the CPU-bound, I/O-bound and network communication tasks. Through a dedicated five-stage storage pipeline design for file archival operations, experimental results show that the system throughput can increase up to 25% according to our workloads.
Keywords :
data compression; parallel programming; storage allocation; chunk content distribution algorithm; data de-duplication; data-intensive storage systems; parallel calculation; performance optimization methods; task partition algorithm; Computer science; Cost function; Cryptography; Information science; Laboratories; Optimization methods; Performance analysis; Pipeline processing; System performance; Throughput; Data De-duplication; Parallel Hash; Performance Optimization; Storage Pipeline;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Parallel and Distributed Systems (ICPADS), 2009 15th International Conference on
Conference_Location :
Shenzhen
ISSN :
1521-9097
Print_ISBN :
978-1-4244-5788-5
Type :
conf
DOI :
10.1109/ICPADS.2009.103
Filename :
5395260
Link To Document :
بازگشت