Title :
Re-pair Achieves High-Order Entropy
Author :
Navarro, Gonzalo ; Russo, Luís
Author_Institution :
Univ. of Chile, Santiago
Abstract :
Re-pair is a dictionary-based compression method invented in 1999 by J. Larsson and A. Moffat [Off-line dictionary-based compression. Proc. IEEE, 88(11):1722-1732, 2000], lacking up to now an efficiency analysis. We show that re-pair compresses a sequence T[1,n] over an alphabet of size sigma to at most 2nHk + o(n log sigma) bits, for any k = o(logsigma n), where Hk is either the classical information-theory or the empirical k-th order entropy (in the latter, the model is inferred from the sequence statistics).
Keywords :
data compression; entropy; sequences; statistical analysis; dictionary-based compression method; high-order entropy; information theory; re-pair; sequence compression; sequence statistics; Algebra; Compression algorithms; Computer science; Data compression; Dictionaries; Entropy; Frequency; Statistics; empirical entropy; grammar-based compression;
Conference_Titel :
Data Compression Conference, 2008. DCC 2008
Conference_Location :
Snowbird, UT
Print_ISBN :
978-0-7695-3121-2
DOI :
10.1109/DCC.2008.79