Title :
Optimal Grid Exploitation Algorithms for Data Mining
Author :
Fiolet, Valerie ; Olejnik, Richard ; Lefait, Guillem ; Toursel, Bernard
Author_Institution :
Comput. Sci. Inst., Mons-Hainault Univ., Mons
Abstract :
Although many data mining tasks have been parallelized and can thus be executed on dedicated clusters, few solutions currently exist to solve data mining problems on a grid or a non-specialized network of workstations. The current tendency is to focus on the use of grids and/or desktop grids in order to exploit any available workstations with no considerations of their physical positions. If a grid specific algorithm has some common characteristics with a dedicated-cluster algorithm, many constraints are inherent to the use of the grid. In particular, resource volatility and communications cost reduce the parallelism effectiveness. The DisDaMin project (distributed data mining) revisits the data mining tasks and proposes new exploitable algorithms for grids. The DisDaMin mechanisms first implement a specific fragmentation of the data using clustering methods, and then realize asynchronous collaborative techniques according to the specifics of execution on grids. The use of this fragmentation method makes it possible to carry out optimal local processing on each node, with a minimum of communications. Using this, we introduce the distributed algorithm DICCoop, an adaptation of DIC by Brin et al. (1997). Simulations were performed to prove the efficiency of the proposed mechanisms and are hosted on the French national grid GRID5000 (part of the European CoreGrid). We analyse the impact of the numerous parameters on optimization of parallel efficiency
Keywords :
data mining; distributed algorithms; grid computing; DICCoop; DisDaMin project; distributed algorithm; distributed data mining; optimal grid exploitation algorithms; Association rules; Clustering algorithms; Computer science; Costs; Data analysis; Data mining; Databases; Distributed algorithms; Distributed computing; Workstations; Association Rules.; Clustering; Data Mining; Data distribution; Grid Computing;
Conference_Titel :
Parallel and Distributed Computing, 2006. ISPDC '06. The Fifth International Symposium on
Conference_Location :
Timisoara
Print_ISBN :
0-7695-2638-1
DOI :
10.1109/ISPDC.2006.36