• DocumentCode
    3268918
  • Title

    ItCompress: an iterative semantic compression algorithm

  • Author

    Jagadish, H.V. ; Ng, Raymond T. ; Ooi, Beng Chin ; Tung, Anthony K H

  • Author_Institution
    Michigan Univ., Ann Arbor, MI, USA
  • fYear
    2004
  • fDate
    30 March-2 April 2004
  • Firstpage
    646
  • Lastpage
    657
  • Abstract
    Real datasets are often large enough to necessitate data compression. Traditional ´syntactic´ data compression methods treat the table as a large byte string and operate at the byte level. The tradeoff in such cases is usually between the ease of retrieval (the ease with which one can retrieve a single tuple or attribute value without decompressing a much larger unit) and the effectiveness of the compression. In this regard, the use of semantic compression has generated considerable interest and motivated certain recent works. We propose a semantic compression algorithm called ItCompress ITerative Compression, which achieves good compression while permitting access even at attribute level without requiring the decompression of a larger unit. ItCompress iteratively improves the compression ratio of the compressed output during each scan of the table. The amount of compression can be tuned based on the number of iterations. Moreover, the initial iterations provide significant compression, thereby making it a cost-effective compression technique. Extensive experiments were conducted and the results indicate the superiority of ItCompress with respect to previously known techniques, such as ´SPARTAN´ and ´fascicles´.
  • Keywords
    computational complexity; data compression; data mining; iterative methods; query processing; ItCompress iterative compression; SPARTAN; data compression; iterative semantic compression algorithm; Compression algorithms; Data analysis; Data compression; Data engineering; Data warehouses; Databases; Information retrieval; Information technology; Monitoring; Upper bound;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Data Engineering, 2004. Proceedings. 20th International Conference on
  • ISSN
    1063-6382
  • Print_ISBN
    0-7695-2065-0
  • Type

    conf

  • DOI
    10.1109/ICDE.2004.1320034
  • Filename
    1320034