• DocumentCode
    8963
  • Title

    Efficient and Compact Representations of Prefix Codes

  • Author

    Gagie, Travis ; Navarro, Gonzalo ; Nekrich, Yakov ; Ordonez, Alberto

  • Author_Institution
    Dept. of Comput. Sci., Univ. of Helsinki, Helsinki, Finland
  • Volume
    61
  • Issue
    9
  • fYear
    2015
  • fDate
    Sept. 2015
  • Firstpage
    4999
  • Lastpage
    5011
  • Abstract
    Most of the attention in statistical compression is given to the space used by the compressed sequence, a problem completely solved with optimal prefix codes. However, in many applications, the storage space used to represent the prefix code itself can be an issue. In this paper, we introduce and compare several techniques to store prefix codes. Let N be the sequence length and n be the alphabet size. Then, a naive storage of an optimal prefix code uses O(n log n) bits. Our first technique shows how to use O(n log log(N/n)) bits to store the optimal prefix code. Then, we introduce an approximate technique that, for any 0 <; ε <; 1/2, takes O(n log log(1/E)) bits to store a prefix code with an average codeword length within an additive ε of the minimum. Finally, a second approximation takes, for any constant c > 1, O(n1/c log n) bits to store a prefix code with an average codeword length at most c times the minimum. In all cases, our data structures allow encoding and decoding of any symbol in O(1) time. We experimentally compare our new techniques with the state of the art, showing that we achieve sixfold-to-eightfold space reductions, at the price of a slower encoding (2.5-8 times slower) and decoding (12-24 times slower). The approximations further reduce this space and improve the time significantly, up to recovering the speed of classical implementations, for a moderate penalty in the average code length. As a byproduct, we compare various heuristic, approximate, and optimal algorithms to generate length-restricted codes, showing that the optimal ones are clearly superior and practical enough to be implemented.
  • Keywords
    Huffman codes; decoding; encoding; statistical analysis; alphabet size; codeword length; compressed sequence; data structures; decoding; encoding; length-restricted codes; optimal prefix codes; statistical compression; storage space; Additives; Approximation methods; Arrays; Decoding; Encoding; Random access memory; Vegetation; Computers and information processing; data compression; data systems; huffman coding;
  • fLanguage
    English
  • Journal_Title
    Information Theory, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    0018-9448
  • Type

    jour

  • DOI
    10.1109/TIT.2015.2452252
  • Filename
    7154462