Title :
Epidemic K-Means Clustering
Author :
Fatta, Giuseppe Di ; Blasa, Francesco ; Cafiero, Simone ; Fortino, Giancarlo
Author_Institution :
Sch. of Syst. Eng., Univ. of Reading, Reading, UK
Abstract :
The K-Means algorithm for cluster analysis is one of the most influential and popular data mining methods. Its straightforward parallel formulation is well suited for distributed memory systems with reliable interconnection networks. However, in large-scale geographically distributed systems the straightforward parallel algorithm can be rendered useless by a single communication failure or high latency in communication paths. This work proposes a fully decentralised algorithm (Epidemic K-Means) which does not require global communication and is intrinsically fault tolerant. The proposed distributed K-Means algorithm provides a clustering solution which can approximate the solution of an ideal centralised algorithm over the aggregated data as closely as desired. A comparative performance analysis is carried out against the state of the art distributed K-Means algorithms based on sampling methods. The experimental analysis confirms that the proposed algorithm is a practical and accurate distributed K-Means implementation for networked systems of very large and extreme scale.
Keywords :
data mining; distributed memory systems; parallel algorithms; pattern clustering; cluster analysis; data mining methods; decentralised algorithm; distributed memory systems; epidemic k-means clustering; interconnection networks; networked systems; parallel algorithm; Algorithm design and analysis; Approximation algorithms; Approximation methods; Clustering algorithms; Peer to peer computing; Protocols; Vectors; Distributed clustering; K-Means; epidemic protocols; gossip-based aggregation; peer-to-peer data mining;
Conference_Titel :
Data Mining Workshops (ICDMW), 2011 IEEE 11th International Conference on
Conference_Location :
Vancouver, BC
Print_ISBN :
978-1-4673-0005-6
DOI :
10.1109/ICDMW.2011.76