• DocumentCode
    3673605
  • Title

    A Multi-dimensional Comparison of Toolkits for Machine Learning with Big Data

  • Author

    Aaron N. Richter;Taghi M. Khoshgoftaar;Sara Landset;Tawfiq Hasanin

  • Author_Institution
    Florida Atlantic Univ., Boca Raton, FL, USA
  • fYear
    2015
  • Firstpage
    1
  • Lastpage
    8
  • Abstract
    Big data is a big business, and effective modeling of this data is key. This paper provides a comprehensive multidimensional analysis of various open source tools for machine learning with big data. An evaluation standard is proposed along with detailed comparisons of the frameworks discussed, with regard to algorithm availability, scalability, speed, and more. The major tools profiled are Mahout, MLlib, H2O, and SAMOA, along with the big data processing engines they utilize, including Hadoop MapReduce, Apache Spark, and Apache Storm. There is not yet one framework that "does it all", but this paper provides insight into each tool´s strengths and weaknesses along with guidance on tool choice for specific needs.
  • Keywords
    "Sparks","Clustering algorithms","Water","Big data","Machine learning algorithms","Data models","Engines"
  • Publisher
    ieee
  • Conference_Titel
    Information Reuse and Integration (IRI), 2015 IEEE International Conference on
  • Type

    conf

  • DOI
    10.1109/IRI.2015.12
  • Filename
    7300948