Title :
Toolkit-Based High-Performance Data Mining of Large Data on MapReduce Clusters
Author :
Wegener, Dennis ; Mock, Michael ; Adranale, Deyaa ; Wrobel, Stefan
Author_Institution :
Fraunhofer Inst. Intell. Anal. & Inf. Syst. IAIS, St. Augustin, Germany
Abstract :
The enormous growth of data in a variety of applications has increased the need for high performance data mining based on distributed environments. However, standard data mining toolkits per se do not allow the usage of computing clusters. The success of MapReduce for analyzing large data has raised a general interest in applying this model to other, data intensive applications. Unfortunately current research has not lead to an integration of GUI based data mining toolkits with distributed file system based MapReduce systems. This paper defines novel principles for modeling and design of the user interface, the storage model and the computational model necessary for the integration of such systems. Additionally, it introduces a novel system architecture for interactive GUI based data mining of large data on clusters based on MapReduce that overcomes the limitations of data mining toolkits. As an empirical demonstration we show an implementation based on Weka and Hadoop.
Keywords :
data mining; graphical user interfaces; MapReduce clusters; MapReduce system; computational model; computing clusters; data analysis; data intensive application; distributed environment; distributed file system; graphical user interfaces; interactive GUI; standard data mining toolkit; system architecture; toolkit-based high performance data mining; Cloud computing; Clustering algorithms; Computer networks; Conferences; Costs; Data mining; Data processing; Decision trees; Machine learning algorithms; Training data;
Conference_Titel :
Data Mining Workshops, 2009. ICDMW '09. IEEE International Conference on
Conference_Location :
Miami, FL
Print_ISBN :
978-1-4244-5384-9
Electronic_ISBN :
978-0-7695-3902-7
DOI :
10.1109/ICDMW.2009.34