Title :
Source coding, large deviations, and approximate pattern matching
Author :
Dembo, Amir ; Kontoyiannis, Ioannis
Author_Institution :
Departments of Math. & Stat., Stanford Univ., CA, USA
fDate :
6/1/2002 12:00:00 AM
Abstract :
We present a development of parts of rate-distortion theory and pattern-matching algorithms for lossy data compression, centered around a lossy version of the asymptotic equipartition property (AEP). This treatment closely parallels the corresponding development in lossless compression, a point of view that was advanced in an important paper of Wyner and Ziv in 1989. In the lossless case, we review how the AEP underlies the analysis of the Lempel-Ziv algorithm by viewing it as a random code and reducing it to the idealized Shannon code. This also provides information about the redundancy of the Lempel-Ziv algorithm and about the asymptotic behavior of several relevant quantities. In the lossy case, we give various versions of the statement of the generalized AEP and we outline the general methodology of its proof via large deviations. Its relationship with Barron (1985) and Orey\´s (1985, 1986) generalized AEP is also discussed. The lossy AEP is applied to (i) prove strengthened versions, of Shannon\´s(1948, 1974) direct source-coding theorem and universal coding theorems; (ii) characterize the performance of "mismatched" codebooks in lossy data compression; ( iii) analyze the performance of pattern-matching algorithms for lossy compression (including Lempel-Ziv schemes); and (iv) determine the first-order asymptotic of waiting times between stationary processes. A refinement to the lossy AEP is then presented, and it is used to (i) prove second-order (direct and converse) lossy source-coding theorems, including universal coding theorems; (ii) characterize which sources are quantitatively easier to compress; (iii) determine the second-order asymptotic of waiting times between stationary processes; and (iv) determine the precise asymptotic behavior of longest match-lengths between stationary processes. Finally, we discuss extensions of the above framework and results to random fields
Keywords :
approximation theory; data compression; pattern matching; random codes; rate distortion theory; source coding; Lempel-Ziv algorithm; approximate pattern matching; asymptotic equipartition property; direct source coding theorem; first-order waiting time asymptotic; generalized AEP; idealized Shannon code; large deviations; longest match-lengths; lossless compression; lossy data compression; mismatched codebooks; pattern-matching algorithms; random code; random fields; rate-distortion theory; redundancy; second-order lossy source-coding theorems; second-order waiting time asymptotic; stationary processes; universal coding theorems; Algorithm design and analysis; Codes; Data analysis; Data compression; Data mining; Mathematics; Pattern matching; Performance loss; Rate-distortion; Source coding;
Journal_Title :
Information Theory, IEEE Transactions on
DOI :
10.1109/TIT.2002.1003841