DocumentCode :
3061447
Title :
On taking advantage of similarities between parameters in lossless sequential coding
Author :
Åberg, Jan
Author_Institution :
Dept. of Inf. Technol., Lund Univ., Sweden
fYear :
1999
fDate :
29-31 Mar 1999
Firstpage :
513
Abstract :
Summary form only given. In sequential lossless data compression algorithms the data stream is often transformed into short subsequences that are modeled as memoryless. Then it is desirable to use any information that each sequence might provide about the behaviour of other sequences that can be expected to have similar properties. Here we examine one such situation, as follows. We want to encode, using arithmetic coding with a sequential estimator, an M-ary memoryless source with unknown parameters θ, from which we have encoded already a sequence xn. In addition, both the encoder and the decoder have observed a sequence yn that is generated independently by another source with unknown parameters θ˜ that are known to be “similar” to θ by a pseudodistance δ(θ,θ˜) that is approximately equal to the relative entropy. Known to both sides is also a number d such that δ(θ,θ˜)⩽d. For a stand-alone memoryless source, the worst-case average redundancy of the (n+1)-th encoding is lower bounded by 0.5(M-1)/n+O(1/n2), and the Dirichlet estimator is close to optimal for this case. We show that this bound holds also for the case with side information as described above, meaning that we can improve, at best, the O(1/n2)-term. We define a frequency weighted estimator for this. Application of the frequency weighted estimator to to the PPM algorithm (Bell et al., 1989) by weighting order-4 statistics into an order-5 model, with d estimated during encoding, yields improvements that are consistent with the bounds above, which means that in practice we improve the performance by about 0.5 bits per active state of the model, making a gain of approximately 20000 bits on the Calgary Corpus
Keywords :
arithmetic codes; data compression; redundancy; sequential estimation; Calgary Corpus; Dirichlet estimator; M-ary memoryless source; arithmetic coding; decoder; encoder; frequency weighted estimator; lossless data compression algorithms; lossless sequential coding; pseudodistance; redundancy; relative entropy; side information; stand-alone memoryless source; worst-case average redundancy; Arithmetic; Data compression; Decoding; Encoding; Entropy; Frequency estimation; Performance gain; State estimation; Statistics; Yield estimation;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Data Compression Conference, 1999. Proceedings. DCC '99
Conference_Location :
Snowbird, UT
ISSN :
1068-0314
Print_ISBN :
0-7695-0096-X
Type :
conf
DOI :
10.1109/DCC.1999.785670
Filename :
785670
Link To Document :
بازگشت