DocumentCode
1142005
Title
A Point Symmetry-Based Clustering Technique for Automatic Evolution of Clusters
Author
Bandyopadhyay, Sanghamitra ; Saha, Sriparna
Author_Institution
Machine Intell. Unit, Indian Stat. Inst., Kolkata
Volume
20
Issue
11
fYear
2008
Firstpage
1441
Lastpage
1457
Abstract
In this paper, a new symmetry-based genetic clustering algorithm is proposed which automatically evolves the number of clusters as well as the proper partitioning from a data set. Strings comprise both real numbers and the don´t care symbol in order to encode a variable number of clusters. Here, assignment of points to different clusters are done based on a point symmetry (PS)-based distance rather than the Euclidean distance. A newly proposed PS-based cluster validity index, sym-index, is used as a measure of the validity of the corresponding partitioning. The algorithm is, therefore, able to detect both convex and nonconvex clusters irrespective of their sizes and shapes as long as they possess the symmetry property. Kd-tree-based nearest neighbor search is used to reduce the complexity of computing PS-based distance. A proof on the convergence property of variable string length genetic algorithm with PS- distance-based clustering (VGAPS-clustering) technique is also provided. The effectiveness of VGAPS-clustering compared to variable string length genetic K-means algorithm (GCUK-clustering) and one recently developed weighted sum validity function-based hybrid niching genetic algorithm (HNGA-clustering) is demonstrated for nine artificial and five real-life data sets.
Keywords
computational complexity; convergence; convex programming; genetic algorithms; pattern classification; pattern clustering; tree searching; Kd-tree-based nearest neighbor search; cluster validity index; computational complexity; convergence property; convex clusters; genetic clustering algorithm; nonconvex clusters; point symmetry-based distance; sym-index; unsupervised classification; variable string length genetic algorithm; Algorithms; Evolutionary computing and genetic algorithms; Pattern Recognition; Similarity measures;
fLanguage
English
Journal_Title
Knowledge and Data Engineering, IEEE Transactions on
Publisher
ieee
ISSN
1041-4347
Type
jour
DOI
10.1109/TKDE.2008.79
Filename
4497194
Link To Document