مرکز منطقه ای اطلاع رساني علوم و فناوري - Improving the Performance of the Symmetric Sparse Matrix-Vector Multiplication in Multicore

DocumentCode :

625594

Title :

Improving the Performance of the Symmetric Sparse Matrix-Vector Multiplication in Multicore

Author :

Gkountouvas, T. ; Karakasis, Vasileios ; Kourtis, Kornilios ; Goumas, Georgios ; Koziris, Nectarios

Author_Institution :

Dept. of Comput. Sci., Cornell Univ., Ithaca, NY, USA

fYear :

2013

fDate :

20-24 May 2013

Firstpage :

273

Lastpage :

283

Abstract :

Symmetric sparse matrices arise often in the solution of sparse linear systems. Exploiting the non-zero element symmetry in order to reduce the overall matrix size is very tempting for optimizing the symmetric Sparse Matrix-Vector Multiplication kernel (SpMxV) for multicore architectures. Despite being very beneficial for the single-threaded execution, not storing the upper or lower triangular part of a symmetric sparse matrix complicates the multithreaded SpMxV version, since it introduces an undesirable dependency on the output vector elements. The most common approach for overcoming this problem is to use local, per-thread vectors, which are reduced to the output vector at the end of the computation. However, this reduction leads to considerable memory traffic, limiting the scalability of the symmetric SpMxV. In this paper, we take a two-step approach in optimizing the symmetric SpMxV kernel. First, we introduce the CSX-Sym variant of the highly compressed CSX format, which exploits the non-zero element symmetry for compressing further the input matrix. Second, we minimize the memory traffic produced by the local vectors reduction phase by implementing a non-zero indexing compression scheme that minimizes the local data to be reduced. Our indexing scheme allowed the scaling of symmetric SpMxV and provided a more than 2x performance improvement over the baseline CSR implementation and 83.9% over the typical symmetric SpMxV kernel. The CSX-Sym variant has further increased the symmetric SpMxV performance by 43.4%. Finally, we evaluate the effect of our optimizations in the context of the CG iterative method, where we achieve an 77.8% acceleration of the overall solver.

Keywords :

indexing; matrix multiplication; multiprocessing systems; performance evaluation; sparse matrices; vectors; CG iterative method; CSX-Sym; baseline CSR implementation; compressed CSX format; indexing scheme; input matrix compression; local data minimization; local vector reduction phase; memory traffic minimization; multicore architectures; multithreaded SpMxV version; nonzero element symmetry; nonzero indexing compression scheme; symmetric SpMxV kernel; symmetric sparse matrix-vector multiplication kernel; Finite element analysis; Indexing; Instruction sets; Kernel; Sparse matrices; Symmetric matrices; Vectors; SpMV; compression; multicore optimization; sparse matrix-vector multiplication; symmetric sparse matrices;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Parallel & Distributed Processing (IPDPS), 2013 IEEE 27th International Symposium on

Conference_Location :

Boston, MA

ISSN :

1530-2075

Print_ISBN :

978-1-4673-6066-1

Type :

conf

DOI :

10.1109/IPDPS.2013.43

Filename :

6569818

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=625594