مرکز منطقه ای اطلاع رساني علوم و فناوري

DocumentCode :

2398155

Title :

Unbounded length contexts for PPM

Author :

Cleary, John G. ; Teahan, W.J. ; Witten, Ian H.

Author_Institution :

Dept. of Comput. Sci., Waikato Univ., Hamilton, New Zealand

fYear :

1995

fDate :

28-30 Mar 1995

Firstpage :

Lastpage :

Abstract :

The prediction by partial matching (PPM) data compression scheme has set the performance standard in lossless compression of text throughout the past decade. The original algorithm was first published in 1984 by Cleary and Witten, and a series of improvements was described by Moffat (1990), culminating in a careful implementation, called PPMC, which has become the benchmark version. This still achieves results superior to virtually all other compression methods, despite many attempts to better it. PPM, is a finite-context statistical modeling technique that can be viewed as blending together several fixed-order context models to predict the next character in the input sequence. Prediction probabilities for each context in the model are calculated from frequency counts which are updated adaptively; and the symbol that actually occurs is encoded relative to its predicted distribution using arithmetic coding. The paper describes a new algorithm, PPM*, which exploits contexts of unbounded length. It reliably achieves compression superior to PPMC, although our current implementation uses considerably greater computational resources (both time and space). The basic PPM compression scheme is described, showing the use of contexts of unbounded length, and how it can be implemented using a tree data structure. Some results are given that demonstrate an improvement of about 6% over the old method

Keywords :

arithmetic codes; data compression; estimation theory; prediction theory; probability; statistical analysis; tree data structures; word processing; PPM*; PPMC; arithmetic coding; data compression; finite-context statistical modeling; fixed-order context models; frequency counts; input sequence; lossless text compression; performance standard; predicted distribution; prediction by partial matching; prediction probabilities; tree data structure; unbounded length contexts; Arithmetic; Benchmark testing; Computer science; Context modeling; Data compression; Data structures; Frequency; Performance loss; Predictive models; Probability;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Data Compression Conference, 1995. DCC '95. Proceedings

Conference_Location :

Snowbird, UT

ISSN :

1068-0314

Print_ISBN :

0-8186-7012-6

Type :

conf

DOI :

10.1109/DCC.1995.515495

Filename :

515495

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=2398155