Abstract :
Motivated by the evident success of context-tree based methods in lossless data compression, we explore, in this paper, methods of the same spirit in universal prediction of individual sequences. By context-tree prediction, we refer to a family of prediction schemes, where at each time instant t, after having observed all outcomes of the data sequence x1,...,xt-1, but not yet xt, the prediction is based on a "context" (or a state) that consists of the k most recent past outcomes xt-k,...,xt-1, where the choice of k may depend on the contents of a possibly longer, though limited, portion of the observed past, xt-k(max),...,xt-1. This is different from the study reported in Feder et al. (1992), where general finite-state predictors as well as "Markov" (finite-memory) predictors of fixed order, where studied in the regime of individual sequences. Another important difference between this study and Feder et al. is the asymptotic regime. While in Feder et al., the resources of the predictor (i.e., the number of states or the memory size) were kept fixed regardless of the length N of the data sequence, here we investigate situations where the number of contexts, or states, is allowed to grow concurrently with N. We are primarily interested in the following fundamental question: What is the critical growth rate of the number of contexts, below which the performance of the best context-tree predictor is still universally achievable, but above which it is not? We show that this critical growth rate is linear in N. In particular, we propose a universal context-tree algorithm that essentially achieves optimum performance as long as the growth rate is sublinear, and show that, on the other hand, this is impossible in the linear case.