DocumentCode :
1442597
Title :
A Validity Index for Prototype-Based Clustering of Data Sets With Complex Cluster Structures
Author :
Demir, Kadim Tas ; Merényi, Erzsébet
Author_Institution :
Rice Univ., Houston, TX, USA
Volume :
41
Issue :
4
fYear :
2011
Firstpage :
1039
Lastpage :
1053
Abstract :
Evaluation of how well the extracted clusters fit the true partitions of a data set is one of the fundamental challenges in unsupervised clustering because the data structure and the number of clusters are unknown a priori. Cluster validity indices are commonly used to select the best partitioning from different clustering results; however, they are often inadequate unless clusters are well separated or have parametrical shapes. Prototype-based clustering (finding of clusters by grouping the prototypes obtained by vector quantization of the data), which is becoming increasingly important for its effectiveness in the analysis of large high-dimensional data sets, adds another dimension to this challenge. For validity assessment of prototype-based clusterings, previously proposed indexes-mostly devised for the evaluation of point-based clusterings-usually perform poorly. The poor performance is made worse when the validity indexes are applied to large data sets with complicated cluster structure. In this paper, we propose a new index, Conn_Index, which can be applied to data sets with a wide variety of clusters of different shapes, sizes, densities, or overlaps. We construct Conn_Index based on inter- and intra-cluster connectivities of prototypes. Connectivities are defined through a “connectivity matrix”, which is a weighted Delaunay graph where the weights indicate the local data distribution. Experiments on synthetic and real data indicate that Conn_Index outperforms existing validity indices, used in this paper, for the evaluation of prototype-based clustering results.
Keywords :
data structures; graph theory; mesh generation; pattern clustering; probability; unsupervised learning; Conn_Index; cluster validity indices; complex cluster structure; connectivity matrix; data structure; high-dimensional data sets; intercluster connectivities; intracluster connectivities; local data distribution; prototype based clustering; synthetic data; unsupervised clustering; vector quantization; weighted Delaunay graph; Data mining; Indexes; Lattices; Measurement; Prototypes; Shape; Topology; Cluster validity index; Conn_Index; complex data structure; connectivity; prototype-based clustering;
fLanguage :
English
Journal_Title :
Systems, Man, and Cybernetics, Part B: Cybernetics, IEEE Transactions on
Publisher :
ieee
ISSN :
1083-4419
Type :
jour
DOI :
10.1109/TSMCB.2010.2104319
Filename :
5708184
Link To Document :
بازگشت