Title :
Bounds on expansion in LZ´77-like coding
Author :
Castelli, Vittorio ; Lastras-Montano, Luis Alfonso
Author_Institution :
IBM T. J. Watson Res. Center, Yorktown Heights, NY, USA
fDate :
5/1/2006 12:00:00 AM
Abstract :
We investigate the maximum increase in number of phrases that results from changing k consecutive symbols in a string x having length n parsed using an LZ´77-like algorithm. We consider a class of compression algorithms that partition a sequence into a collection y of nonoverlapping, variable-length phrases and encode them. Each phrase either is a singleton or matches a substring that starts to its left. We show that changing a single symbol of x in position i can yield an expansion that is of order O(n-i)23/ as (n-i)→∞. Our lower bound requires an alphabet size of O(n-i)13/. We also show that changing k consecutive symbols starting from position i can yield an expansion having a similar but somewhat more involved form. The paper contains both analytically derived upper and lower bounds, and algorithms for numerically computing tighter bounds. While deriving the bounds, we provide a detailed analysis of how expansion can arise when changing consecutive symbols. This problem is motivated by management policies for computer systems, such as the IBM Memory eXpansion Technology (MXT) or the IBM iSeries compressed disks, that use LZ´77-like coding on small compression units, such as 1-4 kbyte, and store the compressed data in memory or on disk tracks. Here, when a change of a portion of the compression unit occurs, for example, an L2 cache line, or a 512-byte disk sector, the data is recompressed and potentially stored in a different location. Knowing the maximum expansion, rather than the average expansion, is an important factor for designing policies for allocation and management of memory or disk space.
Keywords :
data compression; storage management; variable length codes; IBM Memory eXpansion Technology; IBM iSeries compressed disks; LZ´77-like coding; Lempel-Ziv algorithm; compression algorithm; computer systems; disk space; management policy; memory allocation; variable-length encoding; Algorithm design and analysis; Communication system control; Compression algorithms; Data compression; Memory management; Partitioning algorithms; Source coding; Technology management; Upper bound; Visualization; Compression of individual sequences; Lempel–Ziv algorithm; data compression; lossless coding; maximum compressibility change;
Journal_Title :
Information Theory, IEEE Transactions on
DOI :
10.1109/TIT.2006.872848