Title :
A Boosted Clustering Algorithm for Distributed Homogeneous Data Mining
Author :
Li, Chengan ; Wu, Tiejun
Author_Institution :
Inst. of Intelligent Syst. & Decision Making, Zhejiang Univ., Hangzhou
Abstract :
A new distributed clustering algorithm based on boosting techniques is present to efficiently integrate multiple partitions constructed over very large and distributed homogeneous databases that cannot be merged at a single location. In the proposed method, the individual clustering solutions are first produced from disjoint datasets at each boosting round and then the cluster prototypes rather than matrices of partitions are transferred to a site to generate a global cluster prototype which is broadcasted to all distributed sites and used to partition data in each site. Finally, all the individual solutions are combined into a weighted voting ensemble on each disjoint data set. Experimental results demonstrate that the proposed distributed clustering method can effectively achieve clustering accuracy comparable to or slightly better than the algorithms in which boosting techniques are applied to the centralized data. In addition, communication cost of the proposed algorithm is very small
Keywords :
data mining; distributed databases; pattern clustering; unsupervised learning; very large databases; boosted clustering algorithm; data mining; distributed clustering; distributed homogeneous database; global cluster prototype; unsupervised learning; very large database; Boosting; Broadcasting; Clustering algorithms; Clustering methods; Costs; Data mining; Distributed databases; Partitioning algorithms; Prototypes; Voting; Cluster ensembles; boosting strategy; distributed clustering; partition schemes; unsupervised learning;
Conference_Titel :
Intelligent Control and Automation, 2006. WCICA 2006. The Sixth World Congress on
Conference_Location :
Dalian
Print_ISBN :
1-4244-0332-4
DOI :
10.1109/WCICA.2006.1714221