Title :
SystemML: Declarative machine learning on MapReduce
Author :
Ghoting, Amol ; Krishnamurthy, Rajasekar ; Pednault, Edwin ; Reinwald, Berthold ; Sindhwani, Vikas ; Tatikonda, Shirish ; Tian, Yuanyuan ; Vaithyanathan, Shivakumar
Abstract :
MapReduce is emerging as a generic parallel programming paradigm for large clusters of machines. This trend combined with the growing need to run machine learning (ML) algorithms on massive datasets has led to an increased interest in implementing ML algorithms on MapReduce. However, the cost of implementing a large class of ML algorithms as low-level MapReduce jobs on varying data and machine cluster sizes can be prohibitive. In this paper, we propose SystemML in which ML algorithms are expressed in a higher-level language and are compiled and executed in a MapReduce environment. This higher-level language exposes several constructs including linear algebra primitives that constitute key building blocks for a broad class of supervised and unsupervised ML algorithms. The algorithms expressed in SystemML are compiled and optimized into a set of MapReduce jobs that can run on a cluster of machines. We describe and empirically evaluate a number of optimization strategies for efficiently executing these algorithms on Hadoop, an open-source MapReduce implementation. We report an extensive performance evaluation on three ML algorithms on varying data and cluster sizes.
Keywords :
data analysis; high level languages; learning (artificial intelligence); linear algebra; optimisation; parallel programming; SystemML; data cluster; declarative machine learning; higher level language; linear algebra; machine cluster; open source MapReduce; optimization strategy; parallel programming; supervised ML algorithm; unsupervised ML algorithm; Clustering algorithms; Computer architecture; Machine learning; Machine learning algorithms; Optimization; Runtime; Semantics;
Conference_Titel :
Data Engineering (ICDE), 2011 IEEE 27th International Conference on
Conference_Location :
Hannover
Print_ISBN :
978-1-4244-8959-6
Electronic_ISBN :
1063-6382
DOI :
10.1109/ICDE.2011.5767930