DocumentCode
3673605
Title
A Multi-dimensional Comparison of Toolkits for Machine Learning with Big Data
Author
Aaron N. Richter;Taghi M. Khoshgoftaar;Sara Landset;Tawfiq Hasanin
Author_Institution
Florida Atlantic Univ., Boca Raton, FL, USA
fYear
2015
Firstpage
1
Lastpage
8
Abstract
Big data is a big business, and effective modeling of this data is key. This paper provides a comprehensive multidimensional analysis of various open source tools for machine learning with big data. An evaluation standard is proposed along with detailed comparisons of the frameworks discussed, with regard to algorithm availability, scalability, speed, and more. The major tools profiled are Mahout, MLlib, H2O, and SAMOA, along with the big data processing engines they utilize, including Hadoop MapReduce, Apache Spark, and Apache Storm. There is not yet one framework that "does it all", but this paper provides insight into each tool´s strengths and weaknesses along with guidance on tool choice for specific needs.
Keywords
"Sparks","Clustering algorithms","Water","Big data","Machine learning algorithms","Data models","Engines"
Publisher
ieee
Conference_Titel
Information Reuse and Integration (IRI), 2015 IEEE International Conference on
Type
conf
DOI
10.1109/IRI.2015.12
Filename
7300948
Link To Document