DocumentCode :
1435362
Title :
Combining Chunk Boundary and Chunk Signature Calculations for Deduplication
Author :
Litwin, W. ; Long, D. D E ; Schwarz, Thomas
Author_Institution :
Centre d´´Etude y Rech. en Inf. Applique, Univ. Paris Dauphine, Paris, France
Volume :
10
Issue :
1
fYear :
2012
Firstpage :
1305
Lastpage :
1311
Abstract :
Many modern, large-scale storage solutions offer deduplication, which can achieve impressive compression rates for many loads, especially for backups. When accepting new data for storage, deduplication checks whether parts of the data is already stored. If this is the case, then the system does not store that part of the new data but replaces it with a reference to the location where the data already resides. A typical deduplication system breaks data into chunks, hashes each chunk, and uses an index to see whether the chunk has already been stored. Variable chunk systems offer better compression, but process data byte-for-byte twice, first to calculate the chunk boundaries and then to calculate the hash. This limits the ingress bandwidth of a system. We propose a method to reuse the chunk boundary calculations in order to strengthen the collision resistance of the hash, allowing us to use a faster hashing method with fewer bytes or a much larger (256 times by adding two bytes) storage system with the same high assurance against chunk collision and resulting data loss.
Keywords :
data compression; digital signatures; storage management; chunk boundary; chunk signature calculations; data compression; data storage; deduplication system; hashing method; variable chunk systems; Fingerprint recognition; Indexes; Malware; Silicon; Silicon compounds; Vectors; Algebraic Signatures; Deduplication;
fLanguage :
English
Journal_Title :
Latin America Transactions, IEEE (Revista IEEE America Latina)
Publisher :
ieee
ISSN :
1548-0992
Type :
jour
DOI :
10.1109/TLA.2012.6142477
Filename :
6142477
Link To Document :
بازگشت