Dataflow acceleration of Krylov subspace sparse banded problems

Author

Burovskiy, Pavel ; Girdlestone, Stephen ; Davies, Claire ; Sherwin, Spencer ; Luk, Wayne

Author_Institution

Dept. of Comput., Imperial Coll. London, London, UK

fYear

2014

fDate

2-4 Sept. 2014

Firstpage

1

Lastpage

6

Abstract

Most of the efforts in the FPGA community related to sparse linear algebra focus on increasing the degree of internal parallelism in matrix-vector multiply kernels. We propose a parametrisable dataflow architecture presenting an alternative and complementary approach to support acceleration of banded sparse linear algebra problems which benefit from building a Krylov subspace. We use banded structure of a matrix A to overlap the computations Ax, A²x, ..., A^kx by building a pipeline of matrix-vector multiplication processing elements (PEs) each performing Aⁱx. Due to on-chip data locality, FLOPS rate sustainable by such pipeline scales linearly with k. Our approach enables trade-off between the number k of overlapped matrix power actions and the level of parallelism in a PE. We illustrate our approach for Google PageRank computation by power iteration for large banded single precision sparse matrices. Our design scales up to 32 sequential PEs with floating point accumulation and 80 PEs with fixed point accumulation on Stratix V D8 FPGA. With 80 single-pipe fixed point PEs clocked at 160Mhz, our design sustains 12.7 GFLOPS.

Keywords

data flow computing; field programmable gate arrays; mathematics computing; matrix multiplication; pipeline processing; sparse matrices; FLOPS rate; Google PageRank computation; Krylov subspace sparse banded problems; Stratix V D8 FPGA; banded sparse linear algebra problems; dataflow acceleration; fixed point accumulation; floating point accumulation; frequency 160 MHz; large banded single precision sparse matrices; matrix banded structure; matrix power actions; matrix-vector multiplication processing elements; on-chip data locality; parametrisable dataflow architecture; pipeline processing; power iteration; sequential PE; single-pipe fixed point PE; Computer architecture; Field programmable gate arrays; Pipelines; Random access memory; Sparse matrices; System-on-chip; Vectors; Krylov subspace; SpMV; dataflow; iterative solvers; matrix exponentials; matrix powers; performance model; sparse matrix;

fLanguage

English

Publisher

ieee

Conference_Titel

Field Programmable Logic and Applications (FPL), 2014 24th International Conference on

Conference_Location

Munich

Type

conf

DOI

10.1109/FPL.2014.6927453

Filename

6927453