Title :
Estimating the number of clusters in microarray data sets based on an information theoretic criterion
Author :
Nicorici, D. ; Astola, Jaakko ; Yli-Harja, O.
Author_Institution :
Inst. of Signal Process., Tampere Univ. of Technol.
Abstract :
This study focuses on an information theoretic approach for estimating the number of clusters K, in microarray data sets. We present an automatic method for estimating K, based on a particular version of the normalized maximum likelihood (NML) model. The strength of the minimum description length (MDL) methods, such as the NML model, in statistical inference is to find the model structure which, in this particular clustering problem, amounts to find the best number of clusters and the best cluster structure for the data. The models are compared using the NML code length. The study introduces a new method for computing the code length of the encoded clustering vector for the data samples, based on the NML model. Experiments with publicly available microarray data sets demonstrate the ability of the new method to find the biologically meaningful clusters
Keywords :
encoding; maximum likelihood estimation; clusters; encoded clustering vector; information theoretic criterion; microarray data sets; minimum description length; normalized maximum likelihood; statistical inference; Biological information theory; Biological system modeling; Biomedical signal processing; Clustering algorithms; Data analysis; Encoding; Gene expression; Maximum likelihood estimation; Signal processing algorithms; Statistics;
Conference_Titel :
Statistical Signal Processing, 2005 IEEE/SP 13th Workshop on
Conference_Location :
Novosibirsk
Print_ISBN :
0-7803-9403-8
DOI :
10.1109/SSP.2005.1628741