Fast multi-match Lempel-Ziv

Author

Pinho, M.S. ; Finamore, W.A. ; Pearlman, W.A.

Author_Institution

Rensselaer Polytech. Inst., Troy, NY, USA

fYear

1999

fDate

29-31 Mar 1999

Firstpage

545

Abstract

Summary form only given. One of the most popular encoders in the literature is the LZ78, which was proposed by Ziv and Lempel (1978). We establish a recursive way to find the longest m-tuple match. We prove the following theorem that shows how to obtain a longest (m+1)-tuple match from the longest m-tuple match. It shows that a (m+1)-tuple match is the concatenation of the first (m-1) words of the m-tuple match with the next longest double match. Therefore, the longest (m+1)-tuple match can be found using the m-tuple match and a procedure to compute the longest double match. Our theorem is as follows. Let A be a source alphabet, let A* be the set of all finite strings of A, and D⊂A*, such that if x∈D then all prefixes of x belong to D. Let u denote a one-sided infinite sequence. If b₁^m is the longest m-tuple match in u, with respect to D, then there is a longest (m+1)-tuple match bˆ₁^m+1, such that bˆ _i=b_i,∀i∈{1,...m-1}. We implemented the fast mmLZ and the results show a improvement in compression of around 5% over the LZW, in the Canterbury Corpus (Arnold and Bell, 1997) with little extra computational cost

Keywords

concatenated codes; data compression; sequences; string matching; Canterbury Corpus; LZ78; compression; concatenation; infinite sequence; m-tuple match; multi-match Lempel-Ziv encoder; recursive match; Algorithm design and analysis; Compression algorithms; Computational efficiency; Data compression; Dictionaries; Encoding; Numerical analysis;

fLanguage

English

Publisher

ieee

Conference_Titel

Data Compression Conference, 1999. Proceedings. DCC '99

Conference_Location

Snowbird, UT

ISSN

1068-0314

Print_ISBN

0-7695-0096-X

Type

conf

DOI

10.1109/DCC.1999.785702

Filename

785702