Title :
Clustering Categorical Data Based on Representatives
Author :
Aranganayagi, S. ; Thangavel, K.
Abstract :
Clustering of categorical data is one of the data mining techniques, which helps in identifying clusters within the domain space. In this paper we present a new method to cluster categorical data. This new representative based method works in three phases. The dissimilarity matrix, neighbor matrix and the initial clusters are formed in first phase. Merging of clusters is performed in the second phase by relocating the objects using the neighborhood concept. In the third phase, mode of attributes of clusters is computed, and phase I and Phase II are applied for the tuples formed from these representatives. The proposed method is experimented with the well known data sets from UCI data repository, soybean, zoo and mushroom data set.
Keywords :
data mining; categorical data; data mining; dissimilarity matrix; neighbor matrix; tuples; Application software; Art; Clustering algorithms; Computer science; Data mining; Educational institutions; Electronic mail; Information technology; Predictive models; Unsupervised learning; Data Mining; categorical data; clustering; dissimilarity; mode;
Conference_Titel :
Convergence and Hybrid Information Technology, 2008. ICCIT '08. Third International Conference on
Conference_Location :
Busan
Print_ISBN :
978-0-7695-3407-7
DOI :
10.1109/ICCIT.2008.337