Abstract :
A learning machine is called singular if its Fisher information matrix is singular. Almost all learning machines used in information processing are singular, for example, layered neural networks, normal mixtures, binomial mixtures, Bayes networks, hidden Markov models, Boltzmann machines, stochastic context-free grammars, and reduced rank regressions are singular. In singular learning machines, the likelihood function can not be approximated by any quadratic form of the parameter. Moreover, neither the distribution of the maximum likelihood estimator nor the Bayes a posteriori distribution converges to the normal distribution, even if the number of training samples tends to infinity. Therefore, the conventional statistical learning theory does not hold in singular learning machines. This paper establishes the new mathematical foundation for singular learning machines. We propose that, by using resolution of singularities, the likelihood function can be represented as the standard form, by which we can prove the asymptotic behavior of the generalization errors of the maximum likelihood method and the Bayes estimation. The result will be a base on which training algorithms of singular learning machines are devised and optimized
Keywords :
Bayes methods; learning (artificial intelligence); matrix algebra; maximum likelihood estimation; Bayes a posteriori distribution; Bayes estimation; Fisher information matrix; information processing; learning machines; maximum likelihood estimator; statistical learning; Computational intelligence; Gaussian distribution; Hidden Markov models; Information processing; Learning systems; Machine learning; Maximum likelihood estimation; Neural networks; Statistical learning; Stochastic processes;