DocumentCode :
2459032
Title :
Scalable and Numerically Stable Descriptive Statistics in SystemML
Author :
Tian, Yuanyuan ; Tatikonda, Shirish ; Reinwald, Berthold
Author_Institution :
IBM Almaden Res. Center, San Jose, CA, USA
fYear :
2012
fDate :
1-5 April 2012
Firstpage :
1351
Lastpage :
1359
Abstract :
With the exponential growth in the amount of data that is being generated in recent years, there is a pressing need for applying machine learning algorithms to large data sets. SystemML is a framework that employs a declarative approach for large scale data analytics. In SystemML, machine learning algorithms are expressed as scripts in a high-level language, called DML, which is syntactically similar to R. DML scripts are compiled, optimized, and executed in the SystemML runtime that is built on top of MapReduce. As the basis of virtually every quantitative analysis, descriptive statistics provide powerful tools to explore data in SystemML. In this paper, we describe our experience in implementing descriptive statistics in SystemML. In particular, we elaborate on how to overcome the two major challenges: (1) achieving numerical stability while operating on large data sets in a distributed setting of MapReduce, and (2) designing scalable algorithms to compute order statistics in MapReduce. By empirically comparing to algorithms commonly used in existing tools and systems, we demonstrate the numerical accuracy achieved by SystemML. We also highlight the valuable lessons we have learned in this exercise.
Keywords :
data analysis; learning (artificial intelligence); numerical stability; specification languages; statistical analysis; DML; MapReduce; SystemML; declarative approach; high-level language; large scale data analytics; machine learning; numerical stability; numerically stable descriptive statistics; order statistics; scalable descriptive statistics; Accuracy; Approximation algorithms; Correlation; Equations; Higher order statistics; Numerical stability; Standards;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Data Engineering (ICDE), 2012 IEEE 28th International Conference on
Conference_Location :
Washington, DC
ISSN :
1063-6382
Print_ISBN :
978-1-4673-0042-1
Type :
conf
DOI :
10.1109/ICDE.2012.12
Filename :
6228204
Link To Document :
بازگشت