• DocumentCode
    751991
  • Title

    A universal predictor based on pattern matching

  • Author

    Jacquet, Philippe ; Szpankowski, Wojciech ; Apostol, Izydor

  • Author_Institution
    Inst. Nat. de Recherche en Inf. et Autom., Le Chesnay, France
  • Volume
    48
  • Issue
    6
  • fYear
    2002
  • fDate
    6/1/2002 12:00:00 AM
  • Firstpage
    1462
  • Lastpage
    1472
  • Abstract
    We consider a universal predictor based on pattern matching. Given a sequence X1, ..., Xn drawn from a stationary mixing source, it predicts the next symbol Xn+1 based on selecting a context of Xn+1. The predictor, called the sampled pattern matching (SPM), is a modification of the Ehrenfeucht-Mycielski (1992) pseudorandom generator algorithm. It predicts the value of the most frequent symbol appearing at the so-called sampled positions. These positions follow the occurrences of a fraction of the longest suffix of the original sequence that has another copy inside X1X2···Xn ; that is, in SPM, the context selection consists of taking certain fraction of the longest match. The study of the longest match for lossless data compression was initiated by Wyner and Ziv in their 1989 seminal paper. Here, we estimate the redundancy of the SPM universal predictor, that is, we prove that the probability the SPM predictor makes worse decisions than the optimal predictor is O(n) for some 0<ν<½ as n→∞. As a matter of fact, we show that we can predict K=O(1) symbols with the same probability of error
  • Keywords
    data compression; error statistics; pattern matching; prediction theory; sequences; signal sampling; SPM universal predictor; error probability; longest match; longest suffix; lossless data compression; optimal predictor; pseudorandom generator algorithm; redundancy; sampled pattern matching; sampled positions; sequence; stationary mixing source; universal predictor; Biological control systems; Communication system control; Computer science; Data compression; Information theory; Pattern matching; Redundancy; Scanning probe microscopy; Sequences; Source coding;
  • fLanguage
    English
  • Journal_Title
    Information Theory, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    0018-9448
  • Type

    jour

  • DOI
    10.1109/TIT.2002.1003834
  • Filename
    1003834