DocumentCode :
3064886
Title :
Hybrid Bisect K-Means Clustering Algorithm
Author :
Murugesan, Keerthiram ; Zhang, Jun
Author_Institution :
Dept. of Comput. Sci., Univ. of Kentucky, Lexington, KY, USA
fYear :
2011
fDate :
29-31 July 2011
Firstpage :
216
Lastpage :
219
Abstract :
In this paper, we present a hybrid clustering algorithm that combines divisive and agglomerative hierarchical clustering algorithm. Our method uses bisect K-means for divisive clustering algorithm and Unweighted Pair Group Method with Arithmetic Mean (UPGMA) for agglomerative clustering algorithm. First, we cluster the document collection using bisect K-means clustering algorithm with the value K´, which is greater than the total number of clusters, K. Second, we calculate the centroids of K´ clusters obtained from the previous step. Then we apply the UPGMA agglomerative hierarchical algorithm on these centroids for the given value, K. After the UPGMA finds K clusters in these K´ centroids, if two centroids ended up in the same cluster, then all of their documents will belong to the same cluster. We compared the goodness of clusters generated by bisect K-means and the proposed hybrid algorithms, measured on various cluster evaluation metrics. Our experimental results shows that the proposed method outperforms the standard bisect K-means algorithm.
Keywords :
document handling; pattern clustering; UPGMA agglomerative hierarchical algorithm; agglomerative hierarchical clustering algorithm; arithmetic mean; document clustering; k-means algorithm; unweighted pair group method; Clustering algorithms; Complexity theory; Computer science; Entropy; Hybrid power systems; Measurement; Partitioning algorithms; Bisect K-means; document clustering; hybrid algorithm;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Business Computing and Global Informatization (BCGIN), 2011 International Conference on
Conference_Location :
Shanghai
Print_ISBN :
978-1-4577-0788-9
Electronic_ISBN :
978-0-7695-4464-9
Type :
conf
DOI :
10.1109/BCGIn.2011.62
Filename :
6003884
Link To Document :
بازگشت