Title :
Bounds on estimated Markov orders of individual sequences
Author :
Vitale, Luciana ; Martín, Álvaro ; Seroussi, Gadiel
Author_Institution :
Inst. de Comput., Univ. de la Republica, Montevideo, Uruguay
Abstract :
We study the maximal values estimated by commonly used Markov model order estimators on individual sequences. We start with penalized maximum likelihood (PML) estimators with cost functions of the form - log Pk(xn) + f (n)αk, where Pk (xn) is the ML probability of the input sequence xn under a Markov model of order k, a is the size of the input alphabet, and f(n) is an increasing (penalization) function of n (the popular BIC estimator corresponds to f(n) = α - 1/2 log n). Comparison with a memoryless model yields a known upper bound k(n) on the maximum order that xn can estimate. We show that, under mild conditions on f that are satisfied by commonly used penalization functions, this simple bound is not far from tight, in the following sense: for sufficiently large n, and any k<;k̅(n), there are sequences xn that estimate order k; moreover, for all but a vanishing fraction of the values of n such that k = k̅(n), there are sequences xn that estimate order k. We also study KT-based MDL Markov order estimators, and show that in this case, there are sequences xn that estimate order n1/2-ϵ, which is much larger than the maximum log n/log α(l + o(1)) attainable by BIC, or the order o(log n) required for consistency of the KT estimator. In fact, for these sequences, limiting the allowed estimated order might incur in a significant asymptotic penalty in description length. All the results are constructive, and in each case we exhibit explicit sequences that attain the claimed estimated orders.
Keywords :
Markov processes; computational complexity; maximum likelihood estimation; BIC estimator; KT-based MDL Markov order estimators; PML estimators; asymptotic penalty; cost functions; description length; explicit sequences; individual sequences; input alphabet; penalized maximum likelihood estimators; Context; Electronic mail; Entropy; Markov processes; Maximum likelihood estimation; Upper bound;
Conference_Titel :
Information Theory Proceedings (ISIT), 2012 IEEE International Symposium on
Conference_Location :
Cambridge, MA
Print_ISBN :
978-1-4673-2580-6
Electronic_ISBN :
2157-8095
DOI :
10.1109/ISIT.2012.6283023