DocumentCode :
1664931
Title :
A Parallel Distributed Weka Framework for Big Data Mining Using Spark
Author :
Koliopoulos, Aris-Kyriakos ; Yiapanis, Paraskevas ; Tekiner, Firat ; Nenadic, Goran ; Keane, John
Author_Institution :
Sch. of Comput. Sci., Univ. of Manchester, Manchester, UK
fYear :
2015
Firstpage :
9
Lastpage :
16
Abstract :
Effective Big Data Mining requires scalable and efficient solutions that are also accessible to users of all levels of expertise. Despite this, many current efforts to provide effective knowledge extraction via large-scale Big Data Mining tools focus more on performance than on use and tuning which are complex problems even for experts. Weka is a popular and comprehensive Data Mining workbench with a well-known and intuitive interface, nonetheless it supports only sequential single-node execution. Hence, the size of the datasets and processing tasks that Weka can handle within its existing environment is limited both by the amount of memory in a single node and by sequential execution. This work discusses DistributedWekaSpark, a distributed framework for Weka which maintains its existing user interface. The framework is implemented on top of Spark, a Hadoop-related distributed framework with fast in-memory processing capabilities and support for iterative computations. By combining Weka´s usability and Spark´s processing power, DistributedWekaSpark provides a usable prototype distributed Big Data Mining workbench that achieves near-linear scaling in executing various real-world scale workloads - 91.4% weak scaling efficiency on average and up to 4x faster on average than Hadoop.
Keywords :
Big Data; data mining; parallel processing; user interfaces; DistributedWekaSpark; Hadoop-related distributed framework; Spark processing power; Weka usability; datasets size; distributed Big Data mining; fast in-memory processing capabilities; iterative computations; knowledge extraction; large-scale Big Data mining tools; parallel distributed Weka framework; processing tasks; sequential single-node execution; user interface; Algorithm design and analysis; Big data; Computational modeling; Data mining; Load modeling; Object oriented modeling; Sparks; Big Data; Data Mining; Distributed Systems; Machine Learning; Spark; Weka;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Big Data (BigData Congress), 2015 IEEE International Congress on
Conference_Location :
New York, NY
Print_ISBN :
978-1-4673-7277-0
Type :
conf
DOI :
10.1109/BigDataCongress.2015.12
Filename :
7207196
Link To Document :
بازگشت