Title :
Improving the Performance of the Symmetric Sparse Matrix-Vector Multiplication in Multicore
Author :
Gkountouvas, T. ; Karakasis, Vasileios ; Kourtis, Kornilios ; Goumas, Georgios ; Koziris, Nectarios
Author_Institution :
Dept. of Comput. Sci., Cornell Univ., Ithaca, NY, USA
Abstract :
Symmetric sparse matrices arise often in the solution of sparse linear systems. Exploiting the non-zero element symmetry in order to reduce the overall matrix size is very tempting for optimizing the symmetric Sparse Matrix-Vector Multiplication kernel (SpMxV) for multicore architectures. Despite being very beneficial for the single-threaded execution, not storing the upper or lower triangular part of a symmetric sparse matrix complicates the multithreaded SpMxV version, since it introduces an undesirable dependency on the output vector elements. The most common approach for overcoming this problem is to use local, per-thread vectors, which are reduced to the output vector at the end of the computation. However, this reduction leads to considerable memory traffic, limiting the scalability of the symmetric SpMxV. In this paper, we take a two-step approach in optimizing the symmetric SpMxV kernel. First, we introduce the CSX-Sym variant of the highly compressed CSX format, which exploits the non-zero element symmetry for compressing further the input matrix. Second, we minimize the memory traffic produced by the local vectors reduction phase by implementing a non-zero indexing compression scheme that minimizes the local data to be reduced. Our indexing scheme allowed the scaling of symmetric SpMxV and provided a more than 2x performance improvement over the baseline CSR implementation and 83.9% over the typical symmetric SpMxV kernel. The CSX-Sym variant has further increased the symmetric SpMxV performance by 43.4%. Finally, we evaluate the effect of our optimizations in the context of the CG iterative method, where we achieve an 77.8% acceleration of the overall solver.
Keywords :
indexing; matrix multiplication; multiprocessing systems; performance evaluation; sparse matrices; vectors; CG iterative method; CSX-Sym; baseline CSR implementation; compressed CSX format; indexing scheme; input matrix compression; local data minimization; local vector reduction phase; memory traffic minimization; multicore architectures; multithreaded SpMxV version; nonzero element symmetry; nonzero indexing compression scheme; symmetric SpMxV kernel; symmetric sparse matrix-vector multiplication kernel; Finite element analysis; Indexing; Instruction sets; Kernel; Sparse matrices; Symmetric matrices; Vectors; SpMV; compression; multicore optimization; sparse matrix-vector multiplication; symmetric sparse matrices;
Conference_Titel :
Parallel & Distributed Processing (IPDPS), 2013 IEEE 27th International Symposium on
Conference_Location :
Boston, MA
Print_ISBN :
978-1-4673-6066-1
DOI :
10.1109/IPDPS.2013.43