Big R: Large-Scale Analytics on Hadoop Using R

Author

Lara Yejas, Oscar D. ; Weiqiang Zhuang ; Pannu, Adarsh

Author_Institution

IBM Silicon Valley Lab., San Jose, CA, USA

fYear

2014

fDate

June 27 2014-July 2 2014

Firstpage

570

Lastpage

577

Abstract

As the volume of available data continues to rapidly grow from a variety of sources, scalable and performant analytics solutions have become an essential tool to enhance business productivity and revenue. Existing data analysis environments, such as R, are constrained by the size of the main memory and cannot scale in many applications. This paper introduces Big R, a new platform which enables accessing, manipulating, analyzing, and visualizing data residing on a Hadoop cluster from the R user interface. Big R is inspired by R semantics and overloads a number of R primitives to support big data. Hence, users will be able to quickly prototype big data analytics routines without the need of learning a new programming paradigm. The current Big R implementation works on two main fronts: (1) data exploration, which enables R as a query language for Hadoop and (2) partitioned execution, allowing the execution of any R function on smaller pieces of a large dataset across the nodes in the cluster.

Keywords

data analysis; data visualisation; pattern clustering; public domain software; query languages; user interfaces; Big R; Hadoop cluster; R primitives; R semantics; R user interface; data analysis; data exploration; data manipulation; data visualization; large-scale analytics; partitioned execution; query language; Big data; Data mining; Data visualization; Database languages; Delays; Semantics; Vectors; Big data; Machine Learning; MapReduce;

fLanguage

English

Publisher

ieee

Conference_Titel

Big Data (BigData Congress), 2014 IEEE International Congress on

Conference_Location

Anchorage, AK

Print_ISBN

978-1-4799-5056-0

Type

conf

DOI

10.1109/BigData.Congress.2014.88

Filename

6906830