Title :
A New Clustering Algorithm Based on Regions of Influence with Self-Detection of the Best Number of Clusters
Author :
Muhlenbach, Fabrice ; Lallich, Stéphane
Author_Institution :
Lab. Hubert Curien, Univ. de Lyon, St. Etienne, France
Abstract :
Clustering methods usually require to know the best number of clusters, or another parameter, e.g. a threshold, which is not ever easy to provide. This paper proposes a new graph-based clustering method called GBC which detects automatically the best number of clusters, without requiring any other parameter. In this method based on regions of influence, a graph is constructed and the edges of the graph having the higher values are cut according to a hierarchical divisive procedure. An index is calculated from the size average of the cut edges which self-detects the more appropriate number of clusters. The results of GBC for 3 quality indices (Dunn, Silhouette and Davies-Bouldin) are compared with those of K-Means, Ward´s hierarchical clustering method and DBSCAN on 8 benchmarks. The experiments show the good performance of GBC in the case of well separated clusters, even if the data are unbalanced, non-convex or with presence of outliers, whatever the shape of the clusters.
Keywords :
unsupervised learning; DBSCAN; Davies-Bouldin indices; Dunn indices; K-means; Silhouette indices; Ward hierarchical clustering method; graph-based clustering method; hierarchical divisive procedure; unsupervised machine learning task; Bioinformatics; Clustering algorithms; Clustering methods; Data mining; Image edge detection; Iterative algorithms; Machine learning; Partitioning algorithms; Shape; Tree graphs; clustering; neighborhood graph;
Conference_Titel :
Data Mining, 2009. ICDM '09. Ninth IEEE International Conference on
Conference_Location :
Miami, FL
Print_ISBN :
978-1-4244-5242-2
Electronic_ISBN :
1550-4786
DOI :
10.1109/ICDM.2009.133