• DocumentCode
    3852888
  • Title

    A suboptimal lossy data compression based on approximate pattern matching

  • Author

    T. Luczak;W. Szpankowski

  • Author_Institution
    Math. Inst., Polish Acad. of Sci., Poznan, Poland
  • Volume
    43
  • Issue
    5
  • fYear
    1997
  • Firstpage
    1439
  • Lastpage
    1451
  • Abstract
    A practical suboptimal (variable source coding) algorithm for lossy data compression is presented. This scheme is based on approximate string matching, and it naturally extends the lossless Lempel-Ziv (1977) data compression scheme. Among others we consider the typical length of an approximately repeated pattern within the first n positions of a stationary mixing sequence where D percent of mismatches is allowed. We prove that there exists a constant r/sub 0/(D) such that the length of such an approximately repeated pattern converges in probability to 1/r/sub 0/(D) log n (pr.) but it almost surely oscillates between 1/r/sub -/spl infin//(D) log n and 2/r/sub 1/(D) log n, where r/sub -/spl infin//(D)>r/sub 0/(D)>r/sub 1/(D)/2 are some constants. These constants are natural generalizations of Renyi entropies to the lossy environment. More importantly, we show that the compression ratio of a lossy data compression scheme based on such an approximate pattern matching is asymptotically equal to r/sub 0/(D). We also establish the asymptotic behavior of the so-called approximate waiting time N/sub l/ which is defined as the time until a pattern of length C repeats approximately for the first time. We prove that log N/sub l//l/spl rarr/r/sub 0/(D) (pr.) as l/spl rarr//spl infin/. In general, r/sub 0/(D)>R(D) where R(D) is the rate distortion function. Thus for stationary mixing sequences we settle in the negative the problem investigated by Steinberg and Gutman by showing that a lossy extension of the Wyner-Ziv (1989) scheme cannot be optimal.
  • Keywords
    "Pattern matching","Data compression","Entropy","Rate-distortion","Multimedia databases","HDTV","Algorithm design and analysis","Source coding","Multimedia computing","Genetics"
  • Journal_Title
    IEEE Transactions on Information Theory
  • Publisher
    ieee
  • ISSN
    0018-9448
  • Type

    jour

  • DOI
    10.1109/18.623143
  • Filename
    623143