Title :
Efficient computation of normalized maximum likelihood coding for Gaussian mixtures with its applications to optimal clustering
Author :
Hirai, So ; Yamanishi, Kenji
Author_Institution :
Grad. Sch. of Inf. Sci. & Technol., Univ. of Tokyo, Tokyo, Japan
fDate :
July 31 2011-Aug. 5 2011
Abstract :
This paper addresses the issue of estimating from a given data sequence the number of mixture components for a Gaussian mixture model. Our approach is to compute the normalized maximum likelihood (NML) code-length for the data sequence relative to a Gaussian mixture model, then to find the mixture size that attains the minimum of the NML. Here the minimization of the NML code-length is known as Rissanen´s minimum description length (MDL) principle. For discrete domains, Kontkanen and Myllymäki proposed a method of efficient computation of the NML code-length for specific models, however, for continuous domains it has remained open how we compute the NML code-length efficiently. We propose a method for efficient computation of the NML code-length for Gaussian mixture models. We develop it using an approximation of the NML code-length under the restriction of the domain and using the technique of a generating function. We apply it to the issue of determining the optimal number of clusters in clustering using a Gaussian mixture model, where the mixture size is the number of clusters. We use artificial data sets and benchmark data sets to empirically demonstrate that our estimate of the mixture size converges to the true one significantly faster than AIC and BIC.
Keywords :
Gaussian distribution; approximation theory; maximum likelihood decoding; minimisation; pattern clustering; Gaussian mixture model; Kontkanen method; MDL principle; Mylrymaki method; NML code-length; Rissanen minimum description length principle; data sequence; normalized maximum likelihood coding; optimal clustering; Approximation methods; Clustering algorithms; Complexity theory; Computational modeling; Data models; Gaussian distribution; Maximum likelihood estimation;
Conference_Titel :
Information Theory Proceedings (ISIT), 2011 IEEE International Symposium on
Conference_Location :
St. Petersburg
Print_ISBN :
978-1-4577-0596-0
Electronic_ISBN :
2157-8095
DOI :
10.1109/ISIT.2011.6033686