DocumentCode :
1829462
Title :
Accelerating high performance applications with CUDA and MPI
Author :
Karunadasa, N.P. ; Ranasinghe, D.N.
Author_Institution :
Univ. of Colombo Sch. of Comput., Colombo, Sri Lanka
fYear :
2009
fDate :
28-31 Dec. 2009
Firstpage :
331
Lastpage :
336
Abstract :
Compute Unified Device Architecture (CUDA) programmed,Graphic Processing Units (GPUs) are rapidly becoming a major choice in high performance computing. Hence, the number of applications ported to the CUDA platform is growing high. Message Passing Interface(MPI) has been the choice of high performance computing for more than a decade and it has proven its capability in delivering higher performance in parallel applications. CUDA and MPI use different programming approaches but both of them depend on the inherent parallelism of the application to be effective. However, much less research had been carried out to evaluate the performance when CUDA is integrated with other parallel programming paradigms. This paper investigates on integration of these capabilities of both programming approaches and how we can achieve superior performance in general purpose applications. Thus, we have experimented CUDA+MPI programming approach with two well-known algorithms (Strassens Algorithm & Conjugate Gradient Algorithm) and shown how we can achieve higher performance by means of using MPI as computation distributing mechanism and CUDA as the main execution engine. We have developed a general purpose matrix multiplication algorithm and a Conjugate Gradient algorithm using CUDA and MPI. In this approach, MPI functions as the data distributing mechanism between the GPU nodes and CUDA as the main computing engine. This allows the programmer to connect GPU nodes via high speed Ethernet without special technologies. Thus, the programmer is enabled to view each GPU node separately as they are and to execute different components of a program in several GPU nodes.
Keywords :
computer graphics; conjugate gradient methods; coprocessors; local area networks; matrix multiplication; message passing; parallel architectures; parallel programming; Ethernet; computation distributing mechanism; compute unified device architecture; conjugate gradient algorithm; graphic processing units; high performance computing; matrix multiplication algorithm; message passing interface; parallel programming paradigms; strassens algorithm; Acceleration; Computer architecture; Distributed computing; Engines; Ethernet networks; High performance computing; Message passing; Parallel processing; Parallel programming; Programming profession; CUDA; High Performance Computing; MPI;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Industrial and Information Systems (ICIIS), 2009 International Conference on
Conference_Location :
Sri Lanka
Print_ISBN :
978-1-4244-4836-4
Electronic_ISBN :
978-1-4244-4837-1
Type :
conf
DOI :
10.1109/ICIINFS.2009.5429842
Filename :
5429842
Link To Document :
بازگشت