Title :
Robust data clustering
Author :
Ana, L.N.F. ; Jain, Anil K.
Abstract :
We address the problem of robust clustering by combining data partitions (forming a clustering ensemble) produced by multiple clusterings. We formulate robust clustering under an information-theoretical framework; mutual information is the underlying concept used in the definition of quantitative measures of agreement or consistency between data partitions. Robustness is assessed by variance of the cluster membership, based on bootstrapping. We propose and analyze a voting mechanism on pairwise associations of patterns for combining data partitions. We show that the proposed technique attempts to optimize the mutual information based criteria, although the optimality is not ensured in all situations. This evidence accumulation method is demonstrated by combining the well-known K-means algorithm to produce clustering ensembles. Experimental results show the ability of the technique to identify clusters with arbitrary shapes and sizes.
Keywords :
data analysis; information theory; pattern clustering; K-means algorithm; arbitrary shape; arbitrary size; bootstrapping; cluster membership variance; clustering ensemble; consistency agreement; data clustering; data partition combination; evidence accumulation method; information-theoretical framework; multiple clustering; mutual information based criteria optimization; pairwise pattern association; voting mechanism analysis; Analysis of variance; Clustering algorithms; Computer science; Data engineering; Mutual information; Partitioning algorithms; Pattern analysis; Robustness; Shape; Voting;
Conference_Titel :
Computer Vision and Pattern Recognition, 2003. Proceedings. 2003 IEEE Computer Society Conference on
Print_ISBN :
0-7695-1900-8
DOI :
10.1109/CVPR.2003.1211462