DocumentCode :
633092
Title :
RABID -- A General Distributed R Processing Framework Targeting Large Data-Set Problems
Author :
Hao Lin ; Shuo Yang ; Midkiff, Samuel P.
Author_Institution :
Electr. & Comput. Eng., Purdue Univ., West Lafayette, IN, USA
fYear :
2013
fDate :
June 27 2013-July 2 2013
Firstpage :
423
Lastpage :
424
Abstract :
Large-scale data mining and deep data analysis are in high demand in modern enterprises. This work describes the RABID (R Analytics for BIg Data) framework to provide a highly parallel R. We achieve the goal of providing data analysts with an easy-to-use R interface to effectively perform deep data analysis on clusters by integrating R and a MapReduce-like platform. By leveraging a distributed runtime system, our framework enables R, the single-threaded language, to efficiently perfrom parallel analysis of data that cannot fit into a single shared memory machine in parallel. Experiments of data mining benchmarks on our framework show promising results.
Keywords :
data analysis; data mining; distributed shared memory systems; MapReduce-like platform; R analytics for big data framework; RABID framework; data mining benchmark; deep data analysis; distributed runtime system; parallel analysis; shared memory machine; single-threaded language; Data analysis; Data mining; Distributed databases; Electronic mail; Programming; Runtime; Sparks; data mining; distributed systems; programming language;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Big Data (BigData Congress), 2013 IEEE International Congress on
Conference_Location :
Santa Clara, CA
Print_ISBN :
978-0-7695-5006-0
Type :
conf
DOI :
10.1109/BigData.Congress.2013.67
Filename :
6597171
Link To Document :
بازگشت