Title :
Dataflow acceleration of Krylov subspace sparse banded problems
Author :
Burovskiy, Pavel ; Girdlestone, Stephen ; Davies, Claire ; Sherwin, Spencer ; Luk, Wayne
Author_Institution :
Dept. of Comput., Imperial Coll. London, London, UK
Abstract :
Most of the efforts in the FPGA community related to sparse linear algebra focus on increasing the degree of internal parallelism in matrix-vector multiply kernels. We propose a parametrisable dataflow architecture presenting an alternative and complementary approach to support acceleration of banded sparse linear algebra problems which benefit from building a Krylov subspace. We use banded structure of a matrix A to overlap the computations Ax, A2x, ..., Akx by building a pipeline of matrix-vector multiplication processing elements (PEs) each performing Aix. Due to on-chip data locality, FLOPS rate sustainable by such pipeline scales linearly with k. Our approach enables trade-off between the number k of overlapped matrix power actions and the level of parallelism in a PE. We illustrate our approach for Google PageRank computation by power iteration for large banded single precision sparse matrices. Our design scales up to 32 sequential PEs with floating point accumulation and 80 PEs with fixed point accumulation on Stratix V D8 FPGA. With 80 single-pipe fixed point PEs clocked at 160Mhz, our design sustains 12.7 GFLOPS.
Keywords :
data flow computing; field programmable gate arrays; mathematics computing; matrix multiplication; pipeline processing; sparse matrices; FLOPS rate; Google PageRank computation; Krylov subspace sparse banded problems; Stratix V D8 FPGA; banded sparse linear algebra problems; dataflow acceleration; fixed point accumulation; floating point accumulation; frequency 160 MHz; large banded single precision sparse matrices; matrix banded structure; matrix power actions; matrix-vector multiplication processing elements; on-chip data locality; parametrisable dataflow architecture; pipeline processing; power iteration; sequential PE; single-pipe fixed point PE; Computer architecture; Field programmable gate arrays; Pipelines; Random access memory; Sparse matrices; System-on-chip; Vectors; Krylov subspace; SpMV; dataflow; iterative solvers; matrix exponentials; matrix powers; performance model; sparse matrix;
Conference_Titel :
Field Programmable Logic and Applications (FPL), 2014 24th International Conference on
Conference_Location :
Munich
DOI :
10.1109/FPL.2014.6927453