DocumentCode :
3146420
Title :
High Performance Data Mining Using R on Heterogeneous Platforms
Author :
Kumar, Prabhat ; Ozisikyilmaz, Berkin ; Liao, Wei-keng ; Memik, Gokhan ; Choudhary, Alok
Author_Institution :
Dept. of Electr. Eng. & Comput. Sci., Northwestern Univ., Evanston, IL, USA
fYear :
2011
fDate :
16-20 May 2011
Firstpage :
1720
Lastpage :
1729
Abstract :
The exponential increase in the generation and collection of data has led us in a new era of data analysis and information extraction. Conventional systems based on general-purpose processors are unable to keep pace with the heavy computational requirements of data mining techniques. High performance co-processors like GPUs and FPGAs have the potential to handle large computational workloads. In this paper, we present a scalable framework aimed at providing a platform for developing and using high performance data mining applications on heterogeneous platforms. The framework incorporates a software infrastructure and a library of high performance kernels. Furthermore, it includes a variety of optimizations which increase the throughput of applications. The framework spans multiple technologies including R, GPUs, multi-core CPUs, MPI, and parallelnet CDF harnessing their capabilities for high-performance computations. This paper also introduces the concept of interleaving GPU kernels from multiple applications providing significant performance gain. Thus, in comparison to other tools available for data mining, our framework provides an easy-to-use and scalable environment both for application development and execution. The framework is available as a software package which can be easily integrated in the R programming environment.
Keywords :
coprocessors; data mining; field programmable gate arrays; multiprocessing programs; parallel processing; FPGA; GPU; MPI; R programming environment; data analysis; general-purpose processors; heterogeneous platforms; high performance coprocessors; high performance data mining; high performance kernels; information extraction; multicore CPU; optimizations; parallelnet CDF; scalable framework; software infrastructure; Computer architecture; Data mining; Graphics processing unit; Kernel; Libraries; Optimization; Parallel processing;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Parallel and Distributed Processing Workshops and Phd Forum (IPDPSW), 2011 IEEE International Symposium on
Conference_Location :
Shanghai
ISSN :
1530-2075
Print_ISBN :
978-1-61284-425-1
Electronic_ISBN :
1530-2075
Type :
conf
DOI :
10.1109/IPDPS.2011.329
Filename :
6009038
Link To Document :
بازگشت