DocumentCode
2893413
Title
A Hybrid Method for Estimating the Predominant Number of Clusters in a Data Set
Author
Alshaqsi, J. ; Wenjia Wang
Author_Institution
Dept. of Inf. Syst., Sultan Qaboos Univ., Muscat, Oman
Volume
2
fYear
2012
fDate
12-15 Dec. 2012
Firstpage
569
Lastpage
573
Abstract
In cluster analysis, finding out the number of clusters, K, for a given dataset is an important yet very tricky task, simply because there is often no universally accepted correct or wrong answer for non-trivial real world problems and it also depends on the context and purpose of a cluster study. This paper presents a new hybrid method for estimating the predominant number of clusters automatically. It employs a new similarity measure and then calculates the length of constant similarity intervals, L and considers the longest consistent intervals representing the most probable numbers of the clusters under the set context. An error function is defined to measure and evaluate the goodness of estimations. The proposed method has been tested on 3 synthetic datasets and 8 real-world benchmark datasets, and compared with some other popular methods. The experimental results showed that the proposed method is able to determine the desired number of clusters for all the simulated datasets and most of the benchmark datasets, and the statistical tests indicate that our method is significantly better.
Keywords
pattern clustering; statistical testing; cluster analysis; constant similarity intervals; data set; error function; predominant cluster number; real-world benchmark datasets; statistical tests; synthetic datasets; Algorithm design and analysis; Benchmark testing; Clustering algorithms; Context; Iris; Length measurement; Measurement uncertainty; cluster analysis; cluster number; similarity measure;
fLanguage
English
Publisher
ieee
Conference_Titel
Machine Learning and Applications (ICMLA), 2012 11th International Conference on
Conference_Location
Boca Raton, FL
Print_ISBN
978-1-4673-4651-1
Type
conf
DOI
10.1109/ICMLA.2012.146
Filename
6406797
Link To Document