• DocumentCode
    1142005
  • Title

    A Point Symmetry-Based Clustering Technique for Automatic Evolution of Clusters

  • Author

    Bandyopadhyay, Sanghamitra ; Saha, Sriparna

  • Author_Institution
    Machine Intell. Unit, Indian Stat. Inst., Kolkata
  • Volume
    20
  • Issue
    11
  • fYear
    2008
  • Firstpage
    1441
  • Lastpage
    1457
  • Abstract
    In this paper, a new symmetry-based genetic clustering algorithm is proposed which automatically evolves the number of clusters as well as the proper partitioning from a data set. Strings comprise both real numbers and the don´t care symbol in order to encode a variable number of clusters. Here, assignment of points to different clusters are done based on a point symmetry (PS)-based distance rather than the Euclidean distance. A newly proposed PS-based cluster validity index, sym-index, is used as a measure of the validity of the corresponding partitioning. The algorithm is, therefore, able to detect both convex and nonconvex clusters irrespective of their sizes and shapes as long as they possess the symmetry property. Kd-tree-based nearest neighbor search is used to reduce the complexity of computing PS-based distance. A proof on the convergence property of variable string length genetic algorithm with PS- distance-based clustering (VGAPS-clustering) technique is also provided. The effectiveness of VGAPS-clustering compared to variable string length genetic K-means algorithm (GCUK-clustering) and one recently developed weighted sum validity function-based hybrid niching genetic algorithm (HNGA-clustering) is demonstrated for nine artificial and five real-life data sets.
  • Keywords
    computational complexity; convergence; convex programming; genetic algorithms; pattern classification; pattern clustering; tree searching; Kd-tree-based nearest neighbor search; cluster validity index; computational complexity; convergence property; convex clusters; genetic clustering algorithm; nonconvex clusters; point symmetry-based distance; sym-index; unsupervised classification; variable string length genetic algorithm; Algorithms; Evolutionary computing and genetic algorithms; Pattern Recognition; Similarity measures;
  • fLanguage
    English
  • Journal_Title
    Knowledge and Data Engineering, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    1041-4347
  • Type

    jour

  • DOI
    10.1109/TKDE.2008.79
  • Filename
    4497194