DocumentCode
2205693
Title
ProSy: A similarity based inline deduplication system for primary storage
Author
Xin Du ; Weizheng Hu ; Qiang Wang ; Fang Wang
Author_Institution
Wuhan National Laboratory for Optoelectronics, School of Computer, Huazhong University of Science and Technology, China
fYear
2015
fDate
6-7 Aug. 2015
Firstpage
195
Lastpage
204
Abstract
Data deduplication can reduce cost and enhance throughput in backup and archiving systems. Recently, it becomes increasingly popular to apply this technique in primary storage systems, where data is actively used by enterprise business applications. However, the state-of-the-art deduplication systems for primary storages mainly provide offline solutions, which require sufficient time-window, additional space and energy. The biggest challenge for an inline deduplication solution is the acceptable performance in terms of data deduplication ratio, access latency, system throughput and management overhead. In this paper, we propose a high accuracy similarity algorithm, and based on it, construct ProSy, a real-time inline deduplication system for primary storage, which can achieve acceptable comprehensive performance without requiring file layout information. Prosy is more reliable since it uses byte-by-byte comparison instead of strong hash comparison to guarantee data integrity. The main idea behind ProSy is to minimize the size of comparison set by grouping similar file segments into the same category when performing data deduplication. For each segment of files, ProSy searches for common data only within the category which this segment belongs to. The experimental evaluation based on real world datasets shows that ProSy is practical and it achieves satisfactory performance. Comparing with the common file system, ProSy can achieve more than 60% of the max data deduplication ratio, 27% deduction on latency, about 2.7% CPU utilization, 83% write throughput and 144% read throughput.
Keywords
Data structures; File systems; Fingerprint recognition; Layout; Metadata; Servers; Throughput; inline deduplication; primary storage; similarity;
fLanguage
English
Publisher
ieee
Conference_Titel
Networking, Architecture and Storage (NAS), 2015 IEEE International Conference on
Conference_Location
Boston, MA, USA
Type
conf
DOI
10.1109/NAS.2015.7255230
Filename
7255230
Link To Document