DocumentCode :
249447
Title :
Big R: Large-Scale Analytics on Hadoop Using R
Author :
Lara Yejas, Oscar D. ; Weiqiang Zhuang ; Pannu, Adarsh
Author_Institution :
IBM Silicon Valley Lab., San Jose, CA, USA
fYear :
2014
fDate :
June 27 2014-July 2 2014
Firstpage :
570
Lastpage :
577
Abstract :
As the volume of available data continues to rapidly grow from a variety of sources, scalable and performant analytics solutions have become an essential tool to enhance business productivity and revenue. Existing data analysis environments, such as R, are constrained by the size of the main memory and cannot scale in many applications. This paper introduces Big R, a new platform which enables accessing, manipulating, analyzing, and visualizing data residing on a Hadoop cluster from the R user interface. Big R is inspired by R semantics and overloads a number of R primitives to support big data. Hence, users will be able to quickly prototype big data analytics routines without the need of learning a new programming paradigm. The current Big R implementation works on two main fronts: (1) data exploration, which enables R as a query language for Hadoop and (2) partitioned execution, allowing the execution of any R function on smaller pieces of a large dataset across the nodes in the cluster.
Keywords :
data analysis; data visualisation; pattern clustering; public domain software; query languages; user interfaces; Big R; Hadoop cluster; R primitives; R semantics; R user interface; data analysis; data exploration; data manipulation; data visualization; large-scale analytics; partitioned execution; query language; Big data; Data mining; Data visualization; Database languages; Delays; Semantics; Vectors; Big data; Machine Learning; MapReduce;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Big Data (BigData Congress), 2014 IEEE International Congress on
Conference_Location :
Anchorage, AK
Print_ISBN :
978-1-4799-5056-0
Type :
conf
DOI :
10.1109/BigData.Congress.2014.88
Filename :
6906830
Link To Document :
بازگشت