DocumentCode
1829462
Title
Accelerating high performance applications with CUDA and MPI
Author
Karunadasa, N.P. ; Ranasinghe, D.N.
Author_Institution
Univ. of Colombo Sch. of Comput., Colombo, Sri Lanka
fYear
2009
fDate
28-31 Dec. 2009
Firstpage
331
Lastpage
336
Abstract
Compute Unified Device Architecture (CUDA) programmed,Graphic Processing Units (GPUs) are rapidly becoming a major choice in high performance computing. Hence, the number of applications ported to the CUDA platform is growing high. Message Passing Interface(MPI) has been the choice of high performance computing for more than a decade and it has proven its capability in delivering higher performance in parallel applications. CUDA and MPI use different programming approaches but both of them depend on the inherent parallelism of the application to be effective. However, much less research had been carried out to evaluate the performance when CUDA is integrated with other parallel programming paradigms. This paper investigates on integration of these capabilities of both programming approaches and how we can achieve superior performance in general purpose applications. Thus, we have experimented CUDA+MPI programming approach with two well-known algorithms (Strassens Algorithm & Conjugate Gradient Algorithm) and shown how we can achieve higher performance by means of using MPI as computation distributing mechanism and CUDA as the main execution engine. We have developed a general purpose matrix multiplication algorithm and a Conjugate Gradient algorithm using CUDA and MPI. In this approach, MPI functions as the data distributing mechanism between the GPU nodes and CUDA as the main computing engine. This allows the programmer to connect GPU nodes via high speed Ethernet without special technologies. Thus, the programmer is enabled to view each GPU node separately as they are and to execute different components of a program in several GPU nodes.
Keywords
computer graphics; conjugate gradient methods; coprocessors; local area networks; matrix multiplication; message passing; parallel architectures; parallel programming; Ethernet; computation distributing mechanism; compute unified device architecture; conjugate gradient algorithm; graphic processing units; high performance computing; matrix multiplication algorithm; message passing interface; parallel programming paradigms; strassens algorithm; Acceleration; Computer architecture; Distributed computing; Engines; Ethernet networks; High performance computing; Message passing; Parallel processing; Parallel programming; Programming profession; CUDA; High Performance Computing; MPI;
fLanguage
English
Publisher
ieee
Conference_Titel
Industrial and Information Systems (ICIIS), 2009 International Conference on
Conference_Location
Sri Lanka
Print_ISBN
978-1-4244-4836-4
Electronic_ISBN
978-1-4244-4837-1
Type
conf
DOI
10.1109/ICIINFS.2009.5429842
Filename
5429842
Link To Document