Title :
Incremental Locality and Clustering-Based Compression
Author :
Krcal, Lubo ; Holub, Jan
Author_Institution :
Dept. of Theor. Comput. Sci., Czech Tech. Univ. in Prague, Prague, Czech Republic
Abstract :
Current compression solutions either use a limited size locality-based context or the entire input, to which the compressors adapt. This results in suboptimal compression effectiveness due to missing similarities further apart in the former case, or due to too generic adaptation. There are many deduplication and near deduplication systems that search for similarity across the entire input. Although most of these systems excel with their simplicity and speed, none of those goes deeper in terms of full-scale redundancy removal. We propose a novel compression and archival system called ICBCS. Our system goes beyond standard measures for similarity detection, using extended similarity hash and incremental clustering techniques to determine groups of sufficiently similar chunks designated for compression. ICBCS outperforms conventional file compression solutions on datasets consisting of at least mildly redundant files. It also shows that selective application of weak compressor results in better compression ratio and speed than conventional application of a strong compressor.
Keywords :
data compression; ICBCS; archival system; clustering-based compression; compression system; datasets; deduplication systems; extended similarity hash; file compression; generic adaptation; incremental clustering techniques; locality-based context; similarity detection; Clustering algorithms; Compressors; Context; Couplings; Feature extraction; Redundancy; Solids;
Conference_Titel :
Data Compression Conference (DCC), 2015
Conference_Location :
Snowbird, UT
DOI :
10.1109/DCC.2015.23