Title of article :
SBBS: A sliding blocking algorithm with backtracking sub-blocks for duplicate data detection
Author/Authors :
Wang، نويسنده , , GuiPing and Chen، نويسنده , , ShuYu and Lin، نويسنده , , Mingwei and Liu، نويسنده , , XiaoWei، نويسنده ,
Issue Information :
روزنامه با شماره پیاپی سال 2014
Abstract :
With the explosive growth of data, storage systems are facing huge storage pressure due to a mass of redundant data caused by the duplicate copies or regions of files. Data deduplication is a storage-optimization technique that reduces the data footprint by eliminating multiple copies of redundant data and storing only unique data. The basis of data deduplication is duplicate data detection techniques, which divide files into a number of parts, compare corresponding parts between files via hash techniques and find out redundant data. This paper proposes an efficient sliding blocking algorithm with backtracking sub-blocks called SBBS for duplicate data detection. SBBS improves the duplicate data detection precision of the traditional sliding blocking (SB) algorithm via backtracking the left/right 1/4 and 1/2 sub-blocks in matching-failed segments. Experimental results show that SBBS averagely improves the duplicate detection precision by 6.5% compared with the traditional SB algorithm and by 16.5% compared with content-defined chunking (CDC) algorithm, and it does not increase much extra storage overhead when SBBS divides the files into equal chunks of size 8 kB.
Keywords :
Backtracking , SBBS , Content-defined chunking algorithm , Data deduplication , Duplicate data detection , Sliding blocking algorithm
Journal title :
Expert Systems with Applications
Journal title :
Expert Systems with Applications