• DocumentCode
    610064
  • Title

    Compressing Huffman Models on Large Alphabets

  • Author

    Navarro, G. ; Ordonez, A.

  • Author_Institution
    Dept. of Comput. Sci., Univ. of Chile, Santiago de Chile, Chile
  • fYear
    2013
  • fDate
    20-22 March 2013
  • Firstpage
    381
  • Lastpage
    390
  • Abstract
    A naive storage of a Huffman model on a text of length n over an alphabet of size σ requires O(σlog n) bits. This can be reduced to σ logσ + O(σ) bits using canonical codes. This overhead over the entropy can be significant when σ is comparable to n, and it also dictates the amount of main memory required to compress or decompress. We design an encoding scheme that requires σlog log n+O(σ+log2 n) bits in the worst case, and typically less, while supporting encoding and decoding of symbols in O(log log n) time. We show that our technique reduces the storage size of the model of state-of-the-art techniques to around 15% in various real-life sequences over large alphabets, while still offering reasonable compression/decompression times.
  • Keywords
    Huffman codes; computational complexity; decoding; entropy; formal languages; Huffman model compression; canonical codes; compression times; decoding; decompression times; encoding scheme; entropy; large alphabets; Computational modeling; Data structures; Decoding; Encoding; Entropy; Vectors; Vegetation; Huffman compression; compressed data structures;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Data Compression Conference (DCC), 2013
  • Conference_Location
    Snowbird, UT
  • ISSN
    1068-0314
  • Print_ISBN
    978-1-4673-6037-1
  • Type

    conf

  • DOI
    10.1109/DCC.2013.46
  • Filename
    6543074