DocumentCode :
3292127
Title :
Accelerating a Sparse Matrix Iterative Solver Using a High Performance Reconfigurable Computer
Author :
Morris, G.R. ; McGruder, R.Y. ; Abed, K.H.
Author_Institution :
Eng. R&D Center, DoD Supercomput. Resource Center (ERDC DSRC), US Army, Vicksburg, MS, USA
fYear :
2010
fDate :
14-17 June 2010
Firstpage :
517
Lastpage :
523
Abstract :
High performance reconfigurable computers (HPRCs), which combine general-purpose processors (GPPs) and field programmable gate arrays (FPGAs), are now commercially available. These interesting architectures allow for the creation of reconfigurable processors. HPRCs have already been used to accelerate integer and fixed-point applications. However, extensive parallelism and deeply pipelined floating-point cores are necessary to make MHz-scale FPGAs competitive with GHz-scale GPPs, thus making it difficult to accelerate certain kinds of floating-point kernels. Kernels with variable length nested loops, e.g., sparse matrix-vector multiply, have been problematic because of the loop-carried dependence associated with the pipelined floating-point units. While hardware description language (HDL)-based kernels have shown moderate success in addressing this problem, the use of a high-level language (HLL)-based approach to accelerate such applications has been rather elusive. If HPRCs are to become a part of mainstream military and scientific computing, we should emphasize the use of HLL-based programming, whenever possible, rather than HDL-based hardware design. The primary reason is the increased programmer productivity associated with HLLs when compared with HDLs. For example, the floating-point addition statement z = x+y, a single line in an HLL, corresponds to hundreds of lines of HDL. In this paper, we describe the design and implementation of a sparse matrix Jacobi processor to solve systems of linear equations, Ax=b. The parallelized, deeply pipelined, IEEE-754-compliant 32-bit floating-point sparse matrix Jacobi iterative solver runs on a contemporary HPRC. The FPGA-based components are implemented using only an HLL (the C programming language) and the Carte HLL-to-HDL compiler. An HLL-based streaming accumulator allows for the implementation of fully pipelined loops and results in a 2.5-fold wall clock runtime speedup when compared with an equivalent software-only i- - mplementation.
Keywords :
C language; Jacobian matrices; field programmable gate arrays; floating point arithmetic; hardware description languages; iterative methods; mathematics computing; program compilers; C programming language; Carte HLL-to-HDL compiler; HLL-based programming; HLL-based streaming accumulator; field programmable gate arrays; fixed-point application; floating-point addition; general-purpose processor; hardware description language; high performance reconfigurable computer; high-level language; integer application; pipelined floating-point unit; sparse matrix Jacobi processor; sparse matrix iterative solver; Computers; Field programmable gate arrays; Hardware; Jacobian matrices; Kernel; Program processors; Sparse matrices; FPGA; iterative solver; reconfigurable computer; sparse matrix;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
High Performance Computing Modernization Program Users Group Conference (HPCMP-UGC), 2010 DoD
Conference_Location :
Schaumburg, IL
Print_ISBN :
978-1-61284-986-7
Type :
conf
DOI :
10.1109/HPCMP-UGC.2010.30
Filename :
6018033
Link To Document :
بازگشت