Compressing Huffman Models on Large Alphabets

Author

Navarro, G. ; Ordonez, A.

Author_Institution

Dept. of Comput. Sci., Univ. of Chile, Santiago de Chile, Chile

fYear

2013

fDate

20-22 March 2013

Firstpage

381

Lastpage

390

Abstract

A naive storage of a Huffman model on a text of length n over an alphabet of size σ requires O(σlog n) bits. This can be reduced to σ logσ + O(σ) bits using canonical codes. This overhead over the entropy can be significant when σ is comparable to n, and it also dictates the amount of main memory required to compress or decompress. We design an encoding scheme that requires σlog log n+O(σ+log² n) bits in the worst case, and typically less, while supporting encoding and decoding of symbols in O(log log n) time. We show that our technique reduces the storage size of the model of state-of-the-art techniques to around 15% in various real-life sequences over large alphabets, while still offering reasonable compression/decompression times.

Keywords

Huffman codes; computational complexity; decoding; entropy; formal languages; Huffman model compression; canonical codes; compression times; decoding; decompression times; encoding scheme; entropy; large alphabets; Computational modeling; Data structures; Decoding; Encoding; Entropy; Vectors; Vegetation; Huffman compression; compressed data structures;

fLanguage

English

Publisher

ieee

Conference_Titel

Data Compression Conference (DCC), 2013

Conference_Location

Snowbird, UT

ISSN

1068-0314

Print_ISBN

978-1-4673-6037-1

Type

conf

DOI

10.1109/DCC.2013.46

Filename

6543074