DocumentCode :
1421300
Title :
PPM performance with BWT complexity: a fast and effective data compression algorithm
Author :
Effros, Michelle
Author_Institution :
Dept. of Electr. Eng., California Inst. of Technol., Pasadena, CA, USA
Volume :
88
Issue :
11
fYear :
2000
Firstpage :
1703
Lastpage :
1712
Abstract :
This paper introduces a new data compression algorithm. The goal underlying this new code design is to achieve a single lossless compression algorithm with the excellent compression ratios of the prediction by partial mapping (PPM) algorithms and the low complexity of codes based on the Burrows Wheeler Transform (BWT). Like the BWT-based codes, the proposed algorithm requires worst case O(n) computational complexity and memory; in contrast, the unbounded-context PPM algorithm, called PPM*, requires worst case O(n/sup 2/) computational complexity. Like PPM*, the proposed algorithm allows the use of unbounded contexts. Using standard data sets for comparison, the proposed algorithm achieves compression performance better than that of the BWT-based codes and comparable to that of PPM*. In particular, the proposed algorithm yields an average rate of 2.29 bits per character (bpc) on the Calgary corpus; this result compares favorably with the 2.33 and 2.34 bpc of PPM5 and PPM* (PPM algorithms), the 2.43 bpc of BW94 (the original BWT-based code), and the 3.64 and 2.69 bpc of compress and gzip (popular Unix compression algorithms based on Lempel-Ziv (LZ) coding techniques) on the same data set. The given code does not, however, match the best reported compression performance-2.12 bpc with PPMZ9-listed on the Calgary corpus results web page at the time of this publication. Results on the Canterbury corpus give a similar relative standing. The proposed algorithm gives an average rate of 2.15 bpc on the Canterbury corpus, while the Canterbury corpus web page gives average rates of 1.99 bpc for PPMZ9, 2.11 bpc for PPM5, 2.15 bpc for PPM7, 2.23 bpc for BZIP2 (a popular BWT-based code), and 3.31 and 2.53 bpc for compress and gzip, respectively.
Keywords :
computational complexity; source coding; transforms; BWT complexity; Burrows Wheeler Transform; PPM performance; compression performance; compression ratios; data compression algorithm; lossless compression algorithm; prediction by partial mapping; standard data sets; unbounded contexts; worst case O(n) computational complexity; Algorithm design and analysis; Code standards; Compression algorithms; Computational complexity; Context modeling; Data compression; Performance loss; Source coding; Vegetation mapping; Web pages;
fLanguage :
English
Journal_Title :
Proceedings of the IEEE
Publisher :
ieee
ISSN :
0018-9219
Type :
jour
DOI :
10.1109/5.892706
Filename :
892706
Link To Document :
بازگشت