Title :
Parallel EM-Clustering: Fast Convergence by Asynchronous Model Updates
Author :
Plant, Claudia ; Bohm, Christian
Author_Institution :
Florida State Univ., Tallahassee, FL, USA
Abstract :
The data explosion in many applications requires efficient data mining solutions. Fortunately, emerging technologies like grid and cloud computing, high-performance multi-core processors and graphics processing units provide the potential to keep pace with the data explosion and open up new opportunities for designing efficient algorithms. In this paper, we propose a parallel variant of the Expectation Maximization (EM) algorithm suitable for clustering large data sets in a distributed environment. The conventional EM algorithm sequentially iterates two phases: In the E-step, points are assigned to the clusters and in the M-step the cluster models are updated. The basic idea of our approach is allowing asynchronous model updates for faster convergence and best usage of the available resources. The frequency of the updates can be flexibly adjusted to the specific characteristics of the environment including communication costs and computing power of the single devices. An extensive experimental evaluation demonstrates the benefits of our approach.
Keywords :
convergence; data mining; expectation-maximisation algorithm; parallel algorithms; pattern clustering; E-step; M-step; asynchronous model updates; communication cost; computing power; data explosion; data mining; expectation maximization; fast convergence; parallel EM clustering; parallel variant;
Conference_Titel :
Data Mining Workshops (ICDMW), 2010 IEEE International Conference on
Conference_Location :
Sydney, NSW
Print_ISBN :
978-1-4244-9244-2
Electronic_ISBN :
978-0-7695-4257-7
DOI :
10.1109/ICDMW.2010.53