• DocumentCode
    1442597
  • Title

    A Validity Index for Prototype-Based Clustering of Data Sets With Complex Cluster Structures

  • Author

    Demir, Kadim Tas ; Merényi, Erzsébet

  • Author_Institution
    Rice Univ., Houston, TX, USA
  • Volume
    41
  • Issue
    4
  • fYear
    2011
  • Firstpage
    1039
  • Lastpage
    1053
  • Abstract
    Evaluation of how well the extracted clusters fit the true partitions of a data set is one of the fundamental challenges in unsupervised clustering because the data structure and the number of clusters are unknown a priori. Cluster validity indices are commonly used to select the best partitioning from different clustering results; however, they are often inadequate unless clusters are well separated or have parametrical shapes. Prototype-based clustering (finding of clusters by grouping the prototypes obtained by vector quantization of the data), which is becoming increasingly important for its effectiveness in the analysis of large high-dimensional data sets, adds another dimension to this challenge. For validity assessment of prototype-based clusterings, previously proposed indexes-mostly devised for the evaluation of point-based clusterings-usually perform poorly. The poor performance is made worse when the validity indexes are applied to large data sets with complicated cluster structure. In this paper, we propose a new index, Conn_Index, which can be applied to data sets with a wide variety of clusters of different shapes, sizes, densities, or overlaps. We construct Conn_Index based on inter- and intra-cluster connectivities of prototypes. Connectivities are defined through a “connectivity matrix”, which is a weighted Delaunay graph where the weights indicate the local data distribution. Experiments on synthetic and real data indicate that Conn_Index outperforms existing validity indices, used in this paper, for the evaluation of prototype-based clustering results.
  • Keywords
    data structures; graph theory; mesh generation; pattern clustering; probability; unsupervised learning; Conn_Index; cluster validity indices; complex cluster structure; connectivity matrix; data structure; high-dimensional data sets; intercluster connectivities; intracluster connectivities; local data distribution; prototype based clustering; synthetic data; unsupervised clustering; vector quantization; weighted Delaunay graph; Data mining; Indexes; Lattices; Measurement; Prototypes; Shape; Topology; Cluster validity index; Conn_Index; complex data structure; connectivity; prototype-based clustering;
  • fLanguage
    English
  • Journal_Title
    Systems, Man, and Cybernetics, Part B: Cybernetics, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    1083-4419
  • Type

    jour

  • DOI
    10.1109/TSMCB.2010.2104319
  • Filename
    5708184