مرکز منطقه ای اطلاع رساني علوم و فناوري - Compressing Huffman Models on Large Alphabets

DocumentCode :

610064

Title :

Compressing Huffman Models on Large Alphabets

Author :

Navarro, G. ; Ordonez, A.

Author_Institution :

Dept. of Comput. Sci., Univ. of Chile, Santiago de Chile, Chile

fYear :

2013

fDate :

20-22 March 2013

Firstpage :

381

Lastpage :

390

Abstract :

A naive storage of a Huffman model on a text of length n over an alphabet of size σ requires O(σlog n) bits. This can be reduced to σ logσ + O(σ) bits using canonical codes. This overhead over the entropy can be significant when σ is comparable to n, and it also dictates the amount of main memory required to compress or decompress. We design an encoding scheme that requires σlog log n+O(σ+log² n) bits in the worst case, and typically less, while supporting encoding and decoding of symbols in O(log log n) time. We show that our technique reduces the storage size of the model of state-of-the-art techniques to around 15% in various real-life sequences over large alphabets, while still offering reasonable compression/decompression times.

Keywords :

Huffman codes; computational complexity; decoding; entropy; formal languages; Huffman model compression; canonical codes; compression times; decoding; decompression times; encoding scheme; entropy; large alphabets; Computational modeling; Data structures; Decoding; Encoding; Entropy; Vectors; Vegetation; Huffman compression; compressed data structures;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Data Compression Conference (DCC), 2013

Conference_Location :

Snowbird, UT

ISSN :

1068-0314

Print_ISBN :

978-1-4673-6037-1

Type :

conf

DOI :

10.1109/DCC.2013.46

Filename :

6543074

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=610064