• DocumentCode
    1664931
  • Title

    A Parallel Distributed Weka Framework for Big Data Mining Using Spark

  • Author

    Koliopoulos, Aris-Kyriakos ; Yiapanis, Paraskevas ; Tekiner, Firat ; Nenadic, Goran ; Keane, John

  • Author_Institution
    Sch. of Comput. Sci., Univ. of Manchester, Manchester, UK
  • fYear
    2015
  • Firstpage
    9
  • Lastpage
    16
  • Abstract
    Effective Big Data Mining requires scalable and efficient solutions that are also accessible to users of all levels of expertise. Despite this, many current efforts to provide effective knowledge extraction via large-scale Big Data Mining tools focus more on performance than on use and tuning which are complex problems even for experts. Weka is a popular and comprehensive Data Mining workbench with a well-known and intuitive interface, nonetheless it supports only sequential single-node execution. Hence, the size of the datasets and processing tasks that Weka can handle within its existing environment is limited both by the amount of memory in a single node and by sequential execution. This work discusses DistributedWekaSpark, a distributed framework for Weka which maintains its existing user interface. The framework is implemented on top of Spark, a Hadoop-related distributed framework with fast in-memory processing capabilities and support for iterative computations. By combining Weka´s usability and Spark´s processing power, DistributedWekaSpark provides a usable prototype distributed Big Data Mining workbench that achieves near-linear scaling in executing various real-world scale workloads - 91.4% weak scaling efficiency on average and up to 4x faster on average than Hadoop.
  • Keywords
    Big Data; data mining; parallel processing; user interfaces; DistributedWekaSpark; Hadoop-related distributed framework; Spark processing power; Weka usability; datasets size; distributed Big Data mining; fast in-memory processing capabilities; iterative computations; knowledge extraction; large-scale Big Data mining tools; parallel distributed Weka framework; processing tasks; sequential single-node execution; user interface; Algorithm design and analysis; Big data; Computational modeling; Data mining; Load modeling; Object oriented modeling; Sparks; Big Data; Data Mining; Distributed Systems; Machine Learning; Spark; Weka;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Big Data (BigData Congress), 2015 IEEE International Congress on
  • Conference_Location
    New York, NY
  • Print_ISBN
    978-1-4673-7277-0
  • Type

    conf

  • DOI
    10.1109/BigDataCongress.2015.12
  • Filename
    7207196