• DocumentCode
    719409
  • Title

    Incremental Locality and Clustering-Based Compression

  • Author

    Krcal, Lubo ; Holub, Jan

  • Author_Institution
    Dept. of Theor. Comput. Sci., Czech Tech. Univ. in Prague, Prague, Czech Republic
  • fYear
    2015
  • fDate
    7-9 April 2015
  • Firstpage
    203
  • Lastpage
    212
  • Abstract
    Current compression solutions either use a limited size locality-based context or the entire input, to which the compressors adapt. This results in suboptimal compression effectiveness due to missing similarities further apart in the former case, or due to too generic adaptation. There are many deduplication and near deduplication systems that search for similarity across the entire input. Although most of these systems excel with their simplicity and speed, none of those goes deeper in terms of full-scale redundancy removal. We propose a novel compression and archival system called ICBCS. Our system goes beyond standard measures for similarity detection, using extended similarity hash and incremental clustering techniques to determine groups of sufficiently similar chunks designated for compression. ICBCS outperforms conventional file compression solutions on datasets consisting of at least mildly redundant files. It also shows that selective application of weak compressor results in better compression ratio and speed than conventional application of a strong compressor.
  • Keywords
    data compression; ICBCS; archival system; clustering-based compression; compression system; datasets; deduplication systems; extended similarity hash; file compression; generic adaptation; incremental clustering techniques; locality-based context; similarity detection; Clustering algorithms; Compressors; Context; Couplings; Feature extraction; Redundancy; Solids;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Data Compression Conference (DCC), 2015
  • Conference_Location
    Snowbird, UT
  • ISSN
    1068-0314
  • Type

    conf

  • DOI
    10.1109/DCC.2015.23
  • Filename
    7149277