Abstract :
In a recent paper, Grunwald and Langford showed that MDL and Bayesian inference can be statistically inconsistent in a classification context, when the model is wrong. They presented a countable family M = {P1, P2, ...} of probability distributions, a "true" distribution P* outside M and a Bayesian prior distribution Π on M, such that M contains a distribution Q within a small KL divergence δ > 0 from P*, and with substantial prior, e.g. Π(Q) = 1/2. Nevertheless, when data are i.i.d. (independently identically distributed) according to P*, then, no matter how many data are observed, the Bayesian posterior puts nearly all its mass on distributions that are at a distance from P* that is much larger than δ. As a result, classification based on the Bayesian posterior can perform substantially worse than random guessing, no matter how many data are observed, even though the classifier based on Q performs much better than random guessing. Similarly, with probability 1, the distribution inferred by 2-part MDL has KL divergence to P* tending to infinity, and performs much worse than Q in classification - though, intriguingly, in contrast to the full Bayesian predictor, for large n the two-part MDL estimator never performs worse than random guessing.
Keywords :
Bayes methods; belief networks; inference mechanisms; learning (artificial intelligence); statistical distributions; Bayesian inference; Bayesian posterior; Bayesian predictor; Bayesian prior distribution; MDL estimator; model; probability distribution; random guessing; safe learning; Dynamic range; Equations; Frequency; Harmonic distortion; Noise level; Signal analysis; Signal to noise ratio; Spectral analysis; Testing; Total harmonic distortion;