DocumentCode :
2398155
Title :
Unbounded length contexts for PPM
Author :
Cleary, John G. ; Teahan, W.J. ; Witten, Ian H.
Author_Institution :
Dept. of Comput. Sci., Waikato Univ., Hamilton, New Zealand
fYear :
1995
fDate :
28-30 Mar 1995
Firstpage :
52
Lastpage :
61
Abstract :
The prediction by partial matching (PPM) data compression scheme has set the performance standard in lossless compression of text throughout the past decade. The original algorithm was first published in 1984 by Cleary and Witten, and a series of improvements was described by Moffat (1990), culminating in a careful implementation, called PPMC, which has become the benchmark version. This still achieves results superior to virtually all other compression methods, despite many attempts to better it. PPM, is a finite-context statistical modeling technique that can be viewed as blending together several fixed-order context models to predict the next character in the input sequence. Prediction probabilities for each context in the model are calculated from frequency counts which are updated adaptively; and the symbol that actually occurs is encoded relative to its predicted distribution using arithmetic coding. The paper describes a new algorithm, PPM*, which exploits contexts of unbounded length. It reliably achieves compression superior to PPMC, although our current implementation uses considerably greater computational resources (both time and space). The basic PPM compression scheme is described, showing the use of contexts of unbounded length, and how it can be implemented using a tree data structure. Some results are given that demonstrate an improvement of about 6% over the old method
Keywords :
arithmetic codes; data compression; estimation theory; prediction theory; probability; statistical analysis; tree data structures; word processing; PPM*; PPMC; arithmetic coding; data compression; finite-context statistical modeling; fixed-order context models; frequency counts; input sequence; lossless text compression; performance standard; predicted distribution; prediction by partial matching; prediction probabilities; tree data structure; unbounded length contexts; Arithmetic; Benchmark testing; Computer science; Context modeling; Data compression; Data structures; Frequency; Performance loss; Predictive models; Probability;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Data Compression Conference, 1995. DCC '95. Proceedings
Conference_Location :
Snowbird, UT
ISSN :
1068-0314
Print_ISBN :
0-8186-7012-6
Type :
conf
DOI :
10.1109/DCC.1995.515495
Filename :
515495
Link To Document :
بازگشت