مرکز منطقه ای اطلاع رساني علوم و فناوري - On taking advantage of similarities between parameters in lossless sequential coding

DocumentCode :

3061447

Title :

On taking advantage of similarities between parameters in lossless sequential coding

Author :

Åberg, Jan

Author_Institution :

Dept. of Inf. Technol., Lund Univ., Sweden

fYear :

1999

fDate :

29-31 Mar 1999

Firstpage :

513

Abstract :

Summary form only given. In sequential lossless data compression algorithms the data stream is often transformed into short subsequences that are modeled as memoryless. Then it is desirable to use any information that each sequence might provide about the behaviour of other sequences that can be expected to have similar properties. Here we examine one such situation, as follows. We want to encode, using arithmetic coding with a sequential estimator, an M-ary memoryless source with unknown parameters θ, from which we have encoded already a sequence xⁿ. In addition, both the encoder and the decoder have observed a sequence yⁿ that is generated independently by another source with unknown parameters θ˜ that are known to be “similar” to θ by a pseudodistance δ(θ,θ˜) that is approximately equal to the relative entropy. Known to both sides is also a number d such that δ(θ,θ˜)⩽d. For a stand-alone memoryless source, the worst-case average redundancy of the (n+1)-th encoding is lower bounded by 0.5(M-1)/n+O(1/n²), and the Dirichlet estimator is close to optimal for this case. We show that this bound holds also for the case with side information as described above, meaning that we can improve, at best, the O(1/n²)-term. We define a frequency weighted estimator for this. Application of the frequency weighted estimator to to the PPM algorithm (Bell et al., 1989) by weighting order-4 statistics into an order-5 model, with d estimated during encoding, yields improvements that are consistent with the bounds above, which means that in practice we improve the performance by about 0.5 bits per active state of the model, making a gain of approximately 20000 bits on the Calgary Corpus

Keywords :

arithmetic codes; data compression; redundancy; sequential estimation; Calgary Corpus; Dirichlet estimator; M-ary memoryless source; arithmetic coding; decoder; encoder; frequency weighted estimator; lossless data compression algorithms; lossless sequential coding; pseudodistance; redundancy; relative entropy; side information; stand-alone memoryless source; worst-case average redundancy; Arithmetic; Data compression; Decoding; Encoding; Entropy; Frequency estimation; Performance gain; State estimation; Statistics; Yield estimation;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Data Compression Conference, 1999. Proceedings. DCC '99

Conference_Location :

Snowbird, UT

ISSN :

1068-0314

Print_ISBN :

0-7695-0096-X

Type :

conf

DOI :

10.1109/DCC.1999.785670

Filename :

785670

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=3061447