Title :
PARCLE: a parallel clustering algorithm for cluster system
Author :
Zhou, Bing ; Shen, Jun-yi ; Peng, Qin-ke
Author_Institution :
Inst. of Comput. Software, Xi´´an Jiaotong Univ., China
Abstract :
As a low-cost, all-purpose parallel computing system with the advantages of easy usage and good dependability, the cluster system has become a popular platform in lots of fields. Clustering analyzing is one of the important problems in data mining. Because most of its objects are large-scale databases or high-dimension data, clustering requests more powerful computing availability. So how to develop parallel clustering algorithm based on cluster system deserves attention. This paper proposes a new parallel clustering algorithm called PARCLE for very large databases that are suitable for cluster system. This algorithm adopts data parallelism and asynchronous communication to reduce the communication costs. It applies a new clustering algorithm derived from BIRCH to improve the quality of clustering. Our implementation shows high speedups with negligible communication overheads and good clustering result not less that that of linear clustering algorithm.
Keywords :
data mining; database management systems; parallel databases; workstation clusters; PARCLE; asynchronous communication; cluster system; data mining; data parallelism; high-dimension data; large-scale databases; linear clustering algorithm; parallel clustering algorithm; Asynchronous communication; Clustering algorithms; Concurrent computing; Data mining; Databases; Machine learning algorithms; Parallel processing; Partitioning algorithms; Software; Systems engineering and theory;
Conference_Titel :
Machine Learning and Cybernetics, 2003 International Conference on
Print_ISBN :
0-7803-8131-9
DOI :
10.1109/ICMLC.2003.1264431