DocumentCode :
952277
Title :
Initializing Partition-Optimization Algorithms
Author :
Maitra, Ranjan
Author_Institution :
Dept. of Stat., Iowa State Univ., Ames, IA
Volume :
6
Issue :
1
fYear :
2009
Firstpage :
144
Lastpage :
157
Abstract :
Clustering datasets is a challenging problem needed in a wide array of applications. Partition-optimization approaches, such as k-means or expectation-maximization (EM) algorithms, are sub-optimal and find solutions in the vicinity of their initialization. This paper proposes a staged approach to specifying initial values by finding a large number of local modes and then obtaining representatives from the most separated ones. Results on test experiments are excellent. We also provide a detailed comparative assessment of the suggested algorithm with many commonly-used initialization approaches in the literature. Finally, the methodology is applied to two datasets on diurnal microarray gene expressions and industrial releases of mercury.
Keywords :
biology computing; environmental science computing; expectation-maximisation algorithm; genetics; industrial pollution; mercury (metal); molecular biophysics; pattern clustering; proteins; singular value decomposition; comparative assessment; data clustering; diurnal microarray gene expressions; expectation-maximization algorithm; k-means algorithm; mercury industrial release; methylmercury; multiGaussian mixtures; partition-optimization algorithm; protein localization; singular value decomposition; toxic release inventory; Clustering; Multivariate statistics; Singular value decomposition; Statistical methods; and association rules; classification; Algorithms; Arabidopsis; Chemical Hazard Release; Circadian Rhythm; Cluster Analysis; Computational Biology; Data Interpretation, Statistical; Escherichia coli Proteins; Humans; Industrial Waste; Methylmercury Compounds; Normal Distribution; Oligonucleotide Array Sequence Analysis; Pattern Recognition, Automated; Starch;
fLanguage :
English
Journal_Title :
Computational Biology and Bioinformatics, IEEE/ACM Transactions on
Publisher :
ieee
ISSN :
1545-5963
Type :
jour
DOI :
10.1109/TCBB.2007.70244
Filename :
4359893
Link To Document :
بازگشت