Accelerating high performance applications with CUDA and MPI

Author

Karunadasa, N.P. ; Ranasinghe, D.N.

Author_Institution

Univ. of Colombo Sch. of Comput., Colombo, Sri Lanka

fYear

2009

fDate

28-31 Dec. 2009

Firstpage

331

Lastpage

336

Abstract

Compute Unified Device Architecture (CUDA) programmed,Graphic Processing Units (GPUs) are rapidly becoming a major choice in high performance computing. Hence, the number of applications ported to the CUDA platform is growing high. Message Passing Interface(MPI) has been the choice of high performance computing for more than a decade and it has proven its capability in delivering higher performance in parallel applications. CUDA and MPI use different programming approaches but both of them depend on the inherent parallelism of the application to be effective. However, much less research had been carried out to evaluate the performance when CUDA is integrated with other parallel programming paradigms. This paper investigates on integration of these capabilities of both programming approaches and how we can achieve superior performance in general purpose applications. Thus, we have experimented CUDA+MPI programming approach with two well-known algorithms (Strassens Algorithm & Conjugate Gradient Algorithm) and shown how we can achieve higher performance by means of using MPI as computation distributing mechanism and CUDA as the main execution engine. We have developed a general purpose matrix multiplication algorithm and a Conjugate Gradient algorithm using CUDA and MPI. In this approach, MPI functions as the data distributing mechanism between the GPU nodes and CUDA as the main computing engine. This allows the programmer to connect GPU nodes via high speed Ethernet without special technologies. Thus, the programmer is enabled to view each GPU node separately as they are and to execute different components of a program in several GPU nodes.

Keywords

computer graphics; conjugate gradient methods; coprocessors; local area networks; matrix multiplication; message passing; parallel architectures; parallel programming; Ethernet; computation distributing mechanism; compute unified device architecture; conjugate gradient algorithm; graphic processing units; high performance computing; matrix multiplication algorithm; message passing interface; parallel programming paradigms; strassens algorithm; Acceleration; Computer architecture; Distributed computing; Engines; Ethernet networks; High performance computing; Message passing; Parallel processing; Parallel programming; Programming profession; CUDA; High Performance Computing; MPI;

fLanguage

English

Publisher

ieee

Conference_Titel

Industrial and Information Systems (ICIIS), 2009 International Conference on

Conference_Location

Sri Lanka

Print_ISBN

978-1-4244-4836-4

Electronic_ISBN

978-1-4244-4837-1

Type

conf

DOI

10.1109/ICIINFS.2009.5429842

Filename

5429842