• DocumentCode
    249447
  • Title

    Big R: Large-Scale Analytics on Hadoop Using R

  • Author

    Lara Yejas, Oscar D. ; Weiqiang Zhuang ; Pannu, Adarsh

  • Author_Institution
    IBM Silicon Valley Lab., San Jose, CA, USA
  • fYear
    2014
  • fDate
    June 27 2014-July 2 2014
  • Firstpage
    570
  • Lastpage
    577
  • Abstract
    As the volume of available data continues to rapidly grow from a variety of sources, scalable and performant analytics solutions have become an essential tool to enhance business productivity and revenue. Existing data analysis environments, such as R, are constrained by the size of the main memory and cannot scale in many applications. This paper introduces Big R, a new platform which enables accessing, manipulating, analyzing, and visualizing data residing on a Hadoop cluster from the R user interface. Big R is inspired by R semantics and overloads a number of R primitives to support big data. Hence, users will be able to quickly prototype big data analytics routines without the need of learning a new programming paradigm. The current Big R implementation works on two main fronts: (1) data exploration, which enables R as a query language for Hadoop and (2) partitioned execution, allowing the execution of any R function on smaller pieces of a large dataset across the nodes in the cluster.
  • Keywords
    data analysis; data visualisation; pattern clustering; public domain software; query languages; user interfaces; Big R; Hadoop cluster; R primitives; R semantics; R user interface; data analysis; data exploration; data manipulation; data visualization; large-scale analytics; partitioned execution; query language; Big data; Data mining; Data visualization; Database languages; Delays; Semantics; Vectors; Big data; Machine Learning; MapReduce;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Big Data (BigData Congress), 2014 IEEE International Congress on
  • Conference_Location
    Anchorage, AK
  • Print_ISBN
    978-1-4799-5056-0
  • Type

    conf

  • DOI
    10.1109/BigData.Congress.2014.88
  • Filename
    6906830