DocumentCode :
2742985
Title :
Sequential universal lossless techniques for compression of patterns and their description length
Author :
Shamir, Gil I.
Author_Institution :
Dept. of Electr. & Comput. Eng., Utah Univ., Salt Lake City, UT, USA
fYear :
2004
fDate :
23-25 March 2004
Firstpage :
419
Lastpage :
428
Abstract :
A pattern is a sequence of indices that contains all consecutive integer indices up to some integer k in increasing order of first occurrence. If the alphabet of a source that generated a sequence is unknown, the inevitable cost of coding the unknown alphabet symbols can be exploited to create the pattern of the sequence, which, in turn, can be compressed by itself. In this paper, two low-complexity sequential schemes are proposed for universally compressing patterns that are obtained from sequences generated by independently identically distributed (i.i.d.) sources with unknown (possibly large) alphabets of unknown size. The description lengths both schemes assign to a pattern are investigated and bounded by rigorous closed form expressions in terms of the maximum likelihood (ML) probability of the underlying i.i.d. sequence. In particular, each distinct index in the pattern is shown to cost 0.5 log(n/k3)+1.59 log e bits above the i.i.d. ML cost. This results in description length for unknown parameters that is shorter than the minimum code length of an i.i.d. sequence if there are more than e1918/·n13/ indices in the pattern. The sequential performance results are then used to establish a connection between the pattern entropy and the underlying i.i.d. entropy. This final result points out that for large alphabets (including those larger than n), recently derived universal coding redundancy bounds for coding patterns are negligible compared to the reduction in entropy from the underlying i.i.d. one.
Keywords :
computational complexity; maximum likelihood sequence estimation; source coding; consecutive integer index; independently identically distributed source; low-complexity sequential scheme; maximum likelihood probability; pattern compression; pattern entropy; underlying i.i.d. entropy; Cities and towns; Costs; Data compression; Entropy; Frequency estimation; Gas insulated transmission lines; Maximum likelihood decoding; Statistical distributions;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Data Compression Conference, 2004. Proceedings. DCC 2004
ISSN :
1068-0314
Print_ISBN :
0-7695-2082-0
Type :
conf
DOI :
10.1109/DCC.2004.1281487
Filename :
1281487
Link To Document :
بازگشت