DocumentCode :
3011769
Title :
Estimation of escape probabilities for PPM based on universal source coding theory
Author :
Åberg, J. ; Shtarkov, Yu.M. ; Smeets, B.J.M.
Author_Institution :
Dept. of Inf. Technol., Lund Univ., Sweden
fYear :
1997
fDate :
29 Jun-4 Jul 1997
Firstpage :
65
Abstract :
Some of the best compression ratios for text compression are provided by the PPM (prediction by partial matching) class of algorithms. These algorithms are based on arithmetic coding using a fixed-depth Markov chain model of the source, i.e., the subsequence of symbols generated in any state s of the source is assumed to be the output of a memoryless subsource w=w(s). One of the most crucial steps in any PPM algorithm is the choice of the escape probability, i.e., the probability of a previously unseen symbol appearing in a given state. Let A={1,...,M} be the source alphabet, xk=x1x1...xk∈Ak be a sequence of symbols generated by a subsource w with unknown parameters, and m=m(xk) be the number of different symbols in xk. In most incarnations of PPM, the expression for the escape probability used for coding is of the form ϑ(E|xk )=k(k,m)/k+mα+k(k,m), where α is a parameter. The encoding of the “escape symbol” E is followed by a description of the new symbol, which is independent of the choice of escape probability and is not discussed. In almost all PPM schemes, the expression for the escape probability has been chosen heuristically. Following the method of Shtarkov et al. (see Probl. of Information Transmission, vol.31, no.2, p.20-35, 1995) we use a universal coding approach
Keywords :
Markov processes; arithmetic codes; data compression; document image processing; image coding; image matching; memoryless systems; prediction theory; probability; source coding; word processing; algorithms; arithmetic coding; compression ratios; encoding; escape probabilities estimation; escape symbol; fixed-depth Markov chain model; memoryless subsource; prediction by partial matching; source alphabet; symbols subsequence; text compression; universal source coding theory; Data compression; Encoding; Frequency estimation; Information theory; Optimization methods; Optimized production technology; Probability distribution; Source coding; State estimation;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Information Theory. 1997. Proceedings., 1997 IEEE International Symposium on
Conference_Location :
Ulm
Print_ISBN :
0-7803-3956-8
Type :
conf
DOI :
10.1109/ISIT.1997.612980
Filename :
612980
Link To Document :
بازگشت