DocumentCode
65854
Title
A Universal Parallel Two-Pass MDL Context Tree Compression Algorithm
Author
Krishnan, Nikhil ; Baron, Dror
Author_Institution
Dept. of Electr. & Comput. Eng., North Carolina State Univ., Raleigh, NC, USA
Volume
9
Issue
4
fYear
2015
fDate
Jun-15
Firstpage
741
Lastpage
748
Abstract
Computing problems that handle large amounts of data necessitate the use of lossless data compression for efficient storage and transmission. We present a novel lossless universal data compression algorithm that uses parallel computational units to increase the throughput. The length- N input sequence is partitioned into B blocks. Processing each block independently of the other blocks can accelerate the computation by a factor of B, but degrades the compression quality. Instead, our approach is to first estimate the minimum description length (MDL) context tree source underlying the entire input, and then encode each of the B blocks in parallel based on the MDL source. With this two-pass approach, the compression loss incurred by using more parallel units is insignificant. Our algorithm is work-efficient, i.e., its computational complexity is O(N/B). Its redundancy is approximately Blog(N/B) bits above Rissanen´s lower bound on universal compression performance, with respect to any context tree source whose maximal depth is at most log(N/B). We improve the compression by using different quantizers for states of the context tree based on the number of symbols corresponding to those states. Numerical results from a prototype implementation suggest that our algorithm offers a better trade-off between compression and throughput than competing universal data compression algorithms.
Keywords
computational complexity; data compression; tree data structures; Rissanen lower bound; compression quality; computational complexity; lossless universal data compression algorithm; minimum description length context tree source; parallel computational units; quantizers; universal compression performance; universal parallel two-pass MDL context tree compression algorithm; Context; Data compression; Decoding; Encoding; Maximum likelihood estimation; Redundancy; Signal processing algorithms; Big data; computational complexity; data compression; distributed computing; minimum description length; parallel algorithms; redundancy; two-pass code; universal compression; work-efficient algorithms;
fLanguage
English
Journal_Title
Selected Topics in Signal Processing, IEEE Journal of
Publisher
ieee
ISSN
1932-4553
Type
jour
DOI
10.1109/JSTSP.2015.2403800
Filename
7042260
Link To Document