Abstract :
Clustering categorical data received more attention since recent years, but several aspects of the existing algorithms, such as the interpretabilities of found clusters, the impact of data selection orders, are not well solved. A novel categorical data clustering algorithm called CLUBMIS is proposed in this paper, which can effectively find the interesting clusters. In addition, the clusters can be easily interpreted by the maximal frequent itemsets used in the clustering process. Different from most of the hierarchical clustering algorithm, CLUBMIS clusters datasets based on the summarized information, i.e. maximal frequent itemsets, thus it eliminates the effect of different data selection order.
Keywords :
data handling; pattern clustering; CLUBMIS; categorical data clustering; maximal frequent itemsets; Application software; Clustering algorithms; Computer science; Cost function; Data engineering; Educational institutions; Itemsets; Machine learning; Machine learning algorithms; Systems engineering and theory;