DocumentCode
3055545
Title
The effect of data set characteristics on the choice of clustering validity index type
Author
Temizel, Tugba Taskaya ; Mizani, Mehrdad A. ; Inkaya, Tulin ; Yucebas, Sait Can
Author_Institution
Inf. Inst. METU Ankara, Ankara
fYear
2007
fDate
7-9 Nov. 2007
Firstpage
1
Lastpage
6
Abstract
Clustering techniques are widely used to give insight about the similarities/dissimilarities between data set items. Most algorithms require the user to tune parameters such as number of clusters or threshold for cut-off point in a dendrogram. Such parameters also affect the clustering quality. In a good quality cluster, the intra-cluster similarity should be high, whereas the inter-cluster similarity should be low. To determine the optimal cluster number, several cluster validity methods have been proposed. However, there is no guideline with respect to which clustering validity methods can be used in conjunction with which clustering algorithms. In this paper, Dunn and SD validity indices were applied to Kohonen self organizing maps, k-means and agglomerative clustering algorithms and their limitations were shown empirically.
Keywords
data handling; pattern clustering; self-organising feature maps; Kohonen self organizing maps; agglomerative clustering algorithms; cluster validity methods; clustering quality; clustering techniques; clustering validity index type; data set characteristics; data set items; dendrogram; intra-cluster similarity; k-means clustering algorithms; validity indices; Cities and towns; Cleaning; Clustering algorithms; Educational institutions; Employment; Frequency; Industrial engineering; Informatics; Partitioning algorithms; Self organizing feature maps;
fLanguage
English
Publisher
ieee
Conference_Titel
Computer and information sciences, 2007. iscis 2007. 22nd international symposium on
Conference_Location
Ankara
Print_ISBN
978-1-4244-1363-8
Electronic_ISBN
978-1-4244-1364-5
Type
conf
DOI
10.1109/ISCIS.2007.4456856
Filename
4456856
Link To Document