DocumentCode :
2205693
Title :
ProSy: A similarity based inline deduplication system for primary storage
Author :
Xin Du ; Weizheng Hu ; Qiang Wang ; Fang Wang
Author_Institution :
Wuhan National Laboratory for Optoelectronics, School of Computer, Huazhong University of Science and Technology, China
fYear :
2015
fDate :
6-7 Aug. 2015
Firstpage :
195
Lastpage :
204
Abstract :
Data deduplication can reduce cost and enhance throughput in backup and archiving systems. Recently, it becomes increasingly popular to apply this technique in primary storage systems, where data is actively used by enterprise business applications. However, the state-of-the-art deduplication systems for primary storages mainly provide offline solutions, which require sufficient time-window, additional space and energy. The biggest challenge for an inline deduplication solution is the acceptable performance in terms of data deduplication ratio, access latency, system throughput and management overhead. In this paper, we propose a high accuracy similarity algorithm, and based on it, construct ProSy, a real-time inline deduplication system for primary storage, which can achieve acceptable comprehensive performance without requiring file layout information. Prosy is more reliable since it uses byte-by-byte comparison instead of strong hash comparison to guarantee data integrity. The main idea behind ProSy is to minimize the size of comparison set by grouping similar file segments into the same category when performing data deduplication. For each segment of files, ProSy searches for common data only within the category which this segment belongs to. The experimental evaluation based on real world datasets shows that ProSy is practical and it achieves satisfactory performance. Comparing with the common file system, ProSy can achieve more than 60% of the max data deduplication ratio, 27% deduction on latency, about 2.7% CPU utilization, 83% write throughput and 144% read throughput.
Keywords :
Data structures; File systems; Fingerprint recognition; Layout; Metadata; Servers; Throughput; inline deduplication; primary storage; similarity;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Networking, Architecture and Storage (NAS), 2015 IEEE International Conference on
Conference_Location :
Boston, MA, USA
Type :
conf
DOI :
10.1109/NAS.2015.7255230
Filename :
7255230
Link To Document :
بازگشت