DocumentCode :
1465851
Title :
Coordination of Cluster Ensembles via Exact Methods
Author :
Christou, Ioannis T.
Author_Institution :
Athens Inf. Technol., Paiania, Greece
Volume :
33
Issue :
2
fYear :
2011
Firstpage :
279
Lastpage :
293
Abstract :
We present a novel optimization-based method for the combination of cluster ensembles for the class of problems with intracluster criteria, such as Minimum-Sum-of-Squares-Clustering (MSSC). We propose a simple and efficient algorithm-called EXAMCE-for this class of problems that is inspired from a Set-Partitioning formulation of the original clustering problem. We prove some theoretical properties of the solutions produced by our algorithm, and in particular that, under general assumptions, though the algorithm recombines solution fragments so as to find the solution of a Set-Covering relaxation of the original formulation, it is guaranteed to find better solutions than the ones in the ensemble. For the MSSC problem in particular, a prototype implementation of our algorithm found a new better solution than the previously best known for 21 of the test instances of the 40-instance TSPLIB benchmark data sets used in, and and found a worse-quality solution than the best known only five times. For other published benchmark data sets where the optimal MSSC solution is known, we match them. The algorithm is particularly effective when the number of clusters is large, in which case it is able to escape the local minima found by K-means type algorithms by recombining the solutions in a Set-Covering context. We also establish the stability of the algorithm with extensive computational experiments, by showing that multiple runs of EXAMCE for the same clustering problem instance produce high-quality solutions whose Adjusted Rand Index is consistently above 0.95. Finally, in experiments utilizing external criteria to compute the validity of clustering, EXAMCE is capable of producing high-quality results that are comparable in quality to those of the best known clustering algorithms.
Keywords :
algorithm theory; data handling; relaxation theory; set theory; K-means type algorithm; TSPLIB benchmark data set; adjusted Rand index; cluster ensemble; clustering algorithm; exact method; minimum-sum-of-squares-clustering; optimization-based method; set-covering context; set-covering relaxation; set-partitioning formulation; Clustering; combinatorial algorithms.; constrained optimization; machine learning;
fLanguage :
English
Journal_Title :
Pattern Analysis and Machine Intelligence, IEEE Transactions on
Publisher :
ieee
ISSN :
0162-8828
Type :
jour
DOI :
10.1109/TPAMI.2010.85
Filename :
5444880
Link To Document :
بازگشت