Title :
Optimizing sparse matrix vector multiplication on emerging multicores
Author :
Kislal, Orhan ; Wei Ding ; Kandemir, Mahmut ; Demirkiran, Ilteris
Author_Institution :
Dept. of Comput. Sci. & Eng., Pennsylvania State Univ., University Park, PA, USA
Abstract :
After hitting the power wall, the dramatic change in computer architecture from single core to multicore/manycore brings us new challenges on high performance computing, especially for the data intensive applications. Sparse matrix-vector multiplication (SpMV) is one of the most important computations in this area, and has therefore received a lot of attention in recent decades. In contrast to the uniform/regular dense matrix computations, SpMV´s irregular data access patterns with compact data structure for storage make the SpMV optimization more complex than optimizing regular/dense matrix computation. In this work, we look at the SpMV optimization problem in the context of emerging multicores from a different architecture conscious perspective, and propose an optimization strategy that has three key components: mapping, scheduling and data layout reorganization. Specifically, the mapping component derives a suitable iteration-to-core mapping; the scheduling component determines the execution order of loop iterations assigned to each core in the target multicore architecture; and finally, the data layout reorganization component prepares multiple memory layouts for the source (input) vector customized for different row patterns. A distinguishing characteristic of our approach is that it is cache hierarchy aware, that is, all three components take the underlying cache hierarchy of the target multicore architecture into account, and therefore, the derived solution is, in a sense, customized to the target architecture. We evaluate the proposed strategy using 10 sparse matrices with two different multicore systems. Our experimental evaluation reveals that the proposed optimization algorithm brings significant performance improvements (up to 26.5%) over the unoptimized case.
Keywords :
cache storage; matrix decomposition; multiprocessing systems; optimisation; optimising compilers; parallel architectures; processor scheduling; sparse matrices; vectors; SpMV optimization; cache hierarchy aware; compact data structure; computer architecture; data intensive applications; data layout reorganization component; execution order; high performance computing; irregular data access patterns; loop iterations; mapping component; memory layouts; multicore systems; multicores; optimization algorithm; optimization strategy; optimizing sparse matrix vector multiplication; power wall; regular dense matrix computations; regular matrix computation; row patterns; scheduling component; source input vector; sparse matrices; sparse matrix-vector multiplication; suitable iteration-to-core mapping; target multicore architecture; uniform dense matrix computations; Layout; Multicore processing; Optimization; Processor scheduling; Sparse matrices; Vectors; Compiler Optimization; Data Locality; Data Transformation; Loop Transformation; Multicore; Sparse Matrix-Vector Multiplication;
Conference_Titel :
Multi-/Many-core Computing Systems (MuCoCoS), 2013 IEEE 6th International Workshop on
Conference_Location :
Edinburgh
DOI :
10.1109/MuCoCoS.2013.6633600