Title :
Symbol-Based Modeling and Coding of Block Markov Sources
Author :
Nagy, Dániel A. ; Gyorgy, Andras ; Linder, Tamás
Author_Institution :
Dept. of Math. & Stat., Queen´´s Univ., Kingston, Ont.
Abstract :
Industry-standard lossless compression algorithms (such as LZW) are usually implemented so that they work on bytes as symbols. Experiments indicate that data for which bytes are not the natural choice of symbols compress poorly using these implementations, while algorithms working on a bit level perform reasonably on byte-based data in addition to having computational advantages resulting from operating on a small alphabet. In this correspondence, we offer an information-theoretic explanation to these experimental results by assessing the redundancy (which is approximated by the divergence rate of two source distributions) of a bit-based model when applied to a byte-based source. More specifically, we study the problem of approximating a block Markov source (our model for byte-based data) with higher order Markov sources (which model bit-based Markov encoders), and show that the divergence rate between a block Markov source and the best matching higher order Markov model for that source converges to zero exponentially fast as the memory of the model increases. This result is applied to obtain bounds on the redundancy of certain symbol-based universal codes when they are used for byte-aligned sources
Keywords :
Markov processes; block codes; redundancy; source coding; block Markov sources coding; industry-standard lossless compression algorithm; redundancy; symbol-based modeling; Compression algorithms; Councils; Data compression; Entropy; Genetics; Mathematics; Proteins; Redundancy; Scholarships; Statistics; Binary codes; block Markov sources; byte-aligned sources; higher order Markov modeling; lossless coding;
Journal_Title :
Information Theory, IEEE Transactions on
DOI :
10.1109/TIT.2006.885539