Title :
A Hierarchical Clustering Algorithm Based on K-Means with Constraints
Author :
Hang, GuoYan ; Zhang, DongMei ; Ren, Jiadong ; Hu, Changzhen
Author_Institution :
Coll. of Inf. Sci. & Eng., Yanshan Univ., Qinhuangdao, China
Abstract :
Hierarchical clustering is one of the most important tasks in data mining. However, the existing hierarchical clustering algorithms are time-consuming, and have low clustering quality because of ignoring the constraints. In this paper, a Hierarchical Clustering Algorithm based on K-means with Constraints (HCAKC) is proposed. In HCAKC, in order to improve the clustering efficiency, Improved Silhouette is defined to determine the optimal number of clusters. In addition, to improve the hierarchical clustering quality, the existing pairwise must-link and cannot-link constraints are adopted to update the cohesion matrix between clusters. Penalty factor is introduced to modify the similarity metric to address the constraint violation. The experimental results show that HCAKC has lower computational complexity and better clustering quality compared with the existing algorithm CSM.
Keywords :
computational complexity; constraint handling; data mining; pattern clustering; HCAKC; clustering quality; cohesion matrix; computational complexity; constraints; data mining; hierarchical clustering algorithm; improved Silhouette; k-means; penalty factor; similarity metric; Clustering algorithms; Computational complexity; Computer science; Data analysis; Data engineering; Data mining; Educational institutions; Information science; Iterative algorithms; Partitioning algorithms;
Conference_Titel :
Innovative Computing, Information and Control (ICICIC), 2009 Fourth International Conference on
Conference_Location :
Kaohsiung
Print_ISBN :
978-1-4244-5543-0
DOI :
10.1109/ICICIC.2009.18