Title :
Measurement of similarity using link based cluster approach for categorical data
Author :
Pavithra, M. ; Chandrakala, D.
Author_Institution :
Dept. of Comput. Sci. & Eng., Kumaraguru Coll. of Technol., Coimbatore, India
Abstract :
Clustering is to categorize data into groups or clusters such that the data in the same cluster are more similar to each other than to those in different clusters. The problem of clustering categorical data is to find a new partition in dataset to overcome the problem of clustering categorical data via cluster ensembles, result is observed that these techniques unluckily generate a final data partition based on incomplete information. The underlying ensemble-information matrix presents only cluster-data point relations, with many entries being left unknown. This problem degrades the quality of the clustering result. To improve clustering quality a new link-based approach the conventional matrix by discovering unknown entries through similarity between clusters in an ensemble and an efficient link-based algorithm is proposed for the underlying similarity assessment. In this paper propose C-Rank link-based algorithm improve clustering quality and ranking clusters in weighted networks. C-Rank consists of three major phases: (1) identification of candidate clusters; (2) ranking the candidates by integrated cohesion; and (3) elimination of non-maximal clusters. The finally apply this clustering result in graph partitioning technique is applied to a weighted bipartite graph that is formulated from the refined matrix.
Keywords :
data analysis; data mining; graph theory; matrix algebra; pattern clustering; C-rank link-based algorithm; candidate cluster identification; categorical data clustering problem; cluster ranking; cluster-data point relations; clustering quality; data categorization; data partition; ensemble-information matrix; graph partitioning technique; link based cluster approach; nonmaximal cluster elimination; refined matrix; similarity measurement; weighted bipartite graph; Algorithm design and analysis; Clustering algorithms; Computer science; Educational institutions; Entropy; Partitioning algorithms; Robustness; C-Rank link based cluster; Categorical data; Cluster Ensemble; Clustering; Data mining; link-based similarity; refined matrix;
Conference_Titel :
Information Communication and Embedded Systems (ICICES), 2013 International Conference on
Conference_Location :
Chennai
Print_ISBN :
978-1-4673-5786-9
DOI :
10.1109/ICICES.2013.6508312