DocumentCode :
3538372
Title :
JRBridge: A Framework of Large-Scale Statistical Computing for R
Author :
Xia Xie ; Jie Cao ; Hai Jin ; Xijiang Ke ; Wenzhi Cao
Author_Institution :
Services Comput. Technol. & Syst. Lab., Huazhong Univ. of Sci. & Technol., Wuhan, China
fYear :
2012
fDate :
6-8 Dec. 2012
Firstpage :
27
Lastpage :
34
Abstract :
Demands for highly scalable parallel data processing platforms is raising due to an explosion in the number of massive-scale data intensive applications both in industry and in sciences. Performing statistical computing over huge data repositories poses a significant challenge to existing statistical software and computational infrastructure. After analyzing various open source computational infrastructures and their programming paradigm APIs, the results have shown that most of them are JVM based, and their APIs are given as Java interfaces or abstract classes. This paper proposes a generic framework JR Bridge, which can integrate R and JVM-based computational infrastructures by generating Java APIs code wrapper around the native R code automatically and handling type conversion. Using this framework, we build a distributed statistical computing environment by integrating R with Hadoop. With the Hadoop Distributed File System plug in, it brings a way to store and access datasets with millions of objects. With MapReduce plug in, it brings a natural environment to code MapReduce algorithms in R. The experiment result shows JR Bridge scales linearly with the size of the datasets and thus provides a scalable solution for large-scale statistical computing in R.
Keywords :
application program interfaces; mathematics computing; parallel processing; statistical analysis; API; Hadoop distributed file system; JRBridge framework; Java API code wrapper; MapReduce algorithm; large-scale statistical computing; massive-scale data intensive application; parallel data processing platform; Algorithm design and analysis; Bridges; Computational modeling; Java; Libraries; Programming; Storms; Hadoop; JVM; MapReduce; R Language; Statistical Computing Method;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Services Computing Conference (APSCC), 2012 IEEE Asia-Pacific
Conference_Location :
Guilin
Print_ISBN :
978-1-4673-4825-6
Type :
conf
DOI :
10.1109/APSCC.2012.74
Filename :
6478195
Link To Document :
بازگشت