Author_Institution :
Dept. of Electr. Eng., Technion-Israel Inst. of Technol., Haifa, Israel
Abstract :
Inspired by C. E. Shannon\´s celebrated paper: "Prediction and entropy of printed English" (1951), we consider the optimal prediction error for unknown finite-alphabet ergodic Markov sources, for prediction algorithms that make inference about the most probable incoming letter, where the distribution of the unknown source is apparent only via a short training sequence of N + 1 letters. We allow N to be a polynomial function of K, the order of the Markov source, rather than the classical case where N is allowed to be exponential in K. A lower bound on the prediction error is formulated for such universal prediction algorithms, that are based on suffixes that were observed somewhere in the past "training sequence" X-N-1 (i.e. it is assumed that the universal predictor, given the past (N + 1)-sequence which serves as a training sequence is no better than the optimal predictor given only the longest suffix that appeared somewhere in the past X-N -1 vector). For a class of stationary Markov sources (which includes all Markov sources with positive transition probabilities), a particular universal predictor is introduced, and it is demonstrated that its performance is "optimal" in the sense that it yields a prediction error which is close to the lower bound on the universal prediction error, with limited training data. The results are nonasymptotic in the sense that they express the effect of limited training data on the efficiency of universal predictors. An asymptotically optimal universal predictor which is based on pattern matching appears elsewhere in the literature. However, the prediction error of these algorithms does not necessarily come close to the lower bound in the nonasymptotic region
Keywords :
Markov processes; entropy; pattern matching; polynomials; prediction theory; probability; efficient universal prediction algorithm; entropy; finite-alphabet ergodic Markov sources; limited training data; longest suffix; lower bound; nonasymptotic region; optimal prediction error; pattern matching; polynomial function; positive transition probabilities; printed English; source coding; stationary Markov sources; training sequence; unknown sources; Entropy; Error analysis; Inference algorithms; Jacobian matrices; Pattern matching; Polynomials; Prediction algorithms; Statistics; Training data;