Title :
Clustering validity assessment: finding the optimal partitioning of a data set
Author :
Halkidi, Maria ; Vazirgiannis, Michalis
Abstract :
Clustering is a mostly unsupervised procedure and the majority of clustering algorithms depend on certain assumptions in order to define the subgroups present in a data set. As a consequence, in most applications the resulting clustering scheme requires some sort of evaluation regarding its validity. In this paper we present a clustering validity procedure, which evaluates the results of clustering algorithms on data sets. We define a validity index, S Dbw, based on well-defined clustering criteria enabling the selection of optimal input parameter values for a clustering algorithm that result in the best partitioning of a data set. We evaluate the reliability of our index both theoretically and experimentally, considering three representative clustering algorithms run on synthetic and real data sets. We also carried out an evaluation study to compare S Dbw performance with other known validity indices. Our approach performed favorably in all cases, even those in which other indices failed to indicate the correct partitions in a data set
Keywords :
data mining; pattern clustering; SDbw validity index; clustering algorithms; clustering validity assessment; optimal partitioning data set; reliability; Clustering algorithms; Data visualization; Geometry; Humans; Informatics; Multidimensional systems; Partitioning algorithms; Radio access networks; Reliability theory; Visual perception;
Conference_Titel :
Data Mining, 2001. ICDM 2001, Proceedings IEEE International Conference on
Conference_Location :
San Jose, CA
Print_ISBN :
0-7695-1119-8
DOI :
10.1109/ICDM.2001.989517