DocumentCode :
3127468
Title :
Epidemic K-Means Clustering
Author :
Fatta, Giuseppe Di ; Blasa, Francesco ; Cafiero, Simone ; Fortino, Giancarlo
Author_Institution :
Sch. of Syst. Eng., Univ. of Reading, Reading, UK
fYear :
2011
fDate :
11-11 Dec. 2011
Firstpage :
151
Lastpage :
158
Abstract :
The K-Means algorithm for cluster analysis is one of the most influential and popular data mining methods. Its straightforward parallel formulation is well suited for distributed memory systems with reliable interconnection networks. However, in large-scale geographically distributed systems the straightforward parallel algorithm can be rendered useless by a single communication failure or high latency in communication paths. This work proposes a fully decentralised algorithm (Epidemic K-Means) which does not require global communication and is intrinsically fault tolerant. The proposed distributed K-Means algorithm provides a clustering solution which can approximate the solution of an ideal centralised algorithm over the aggregated data as closely as desired. A comparative performance analysis is carried out against the state of the art distributed K-Means algorithms based on sampling methods. The experimental analysis confirms that the proposed algorithm is a practical and accurate distributed K-Means implementation for networked systems of very large and extreme scale.
Keywords :
data mining; distributed memory systems; parallel algorithms; pattern clustering; cluster analysis; data mining methods; decentralised algorithm; distributed memory systems; epidemic k-means clustering; interconnection networks; networked systems; parallel algorithm; Algorithm design and analysis; Approximation algorithms; Approximation methods; Clustering algorithms; Peer to peer computing; Protocols; Vectors; Distributed clustering; K-Means; epidemic protocols; gossip-based aggregation; peer-to-peer data mining;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Data Mining Workshops (ICDMW), 2011 IEEE 11th International Conference on
Conference_Location :
Vancouver, BC
Print_ISBN :
978-1-4673-0005-6
Type :
conf
DOI :
10.1109/ICDMW.2011.76
Filename :
6137374
Link To Document :
بازگشت