Title :
A comparative analysis of dissimilarity measures for clustering categorical data
Author :
Xavier, Joao C. ; Canuto, Anne M. P. ; Almeida, Noriedson D. ; Goncalves, Luiz M. G.
Author_Institution :
Digital Metropolis Inst., Fed. Univ. of Rio Grande do Norte, Natal, Brazil
Abstract :
Similarity and dissimilarity (distance) between objects is an important aspect that must be considered when clustering data. When clustering categorical data, for instance, these distance (similarity or dissimilarity) measures need to address properly the real particularities of categorical data. In this paper, we perform a comparative analysis with four different dissimilarity measures used as a distance metric for clustering categorical data. The first one is the Simple Matching Dissimilarity Measure (SMDM), which is one of the simplest and the most used metric for categorical attribute. The other two are context-based approaches (DIstance Learning in Categorical Attributes - DILCA and Domain Value Dissimilarity-DVD), and the last one is an extension of the SMDM, which is proposed in this paper. All four dissimilarities are applied as distance metrics in two well known clustering algorithms, k-means and agglomerative hierarchical clustering algorithms. In this analysis, we also use internal and external cluster validity measures, aiming to compare the effectiveness of all four distance measures in both clustering algorithms.
Keywords :
data handling; learning (artificial intelligence); pattern clustering; DILCA; DVD; SMDM; categorical data clustering; comparative analysis; context based approaches; dissimilarity measurement; distance learning in categorical attributes; distance metrics; domain value dissimilarity; simple matching dissimilarity measure; Clustering algorithms; DVD; Equations; Indexes; Machine learning algorithms; Weight measurement;
Conference_Titel :
Neural Networks (IJCNN), The 2013 International Joint Conference on
Conference_Location :
Dallas, TX
Print_ISBN :
978-1-4673-6128-6
DOI :
10.1109/IJCNN.2013.6707039