A Hierarchical Tridiagonal System Solver for Heterogenous Supercomputers

Author

Xinliang Wang ; Yangtong Xu ; Wei Xue

Author_Institution

Dept. of Comput. Sci. & Technol., Tsinghua Univ., Beijing, China

fYear

2014

fDate

17-17 Nov. 2014

Firstpage

69

Lastpage

76

Abstract

Tridiagonal system solver is an important kernel in many scientific and engineering applications. Even though quite a few parallel algorithms and implementations have been addressed in recent years, challenges still remain when solving large-scale tridiagonal system on heterogenous supercomputers. In this paper, a hierarchical algorithm framework SPIKE (pronounced ´SPIKE squared´) is proposed to minimize the parallel overhead and to achieve the best utilization of CPU-GPU hybrid systems. In these systems, a layered and adaptive partitioning is presented based on the SPIKE algorithm to effectively control the sequential parts while efficiently exploiting the computation and communication overlapping in heterogeneous computing node. Moreover, the SPIKE algorithm is reformulated to reduce the matrix computations to only 1/3 in our hierarchical algorithm framework. Meanwhile, an improved implementation of the tiled-PCR-pThomas algorithm is employed for the GPU architecture, and the shared memory usage on the GPU can be reduced by 1/3 using careful dependence analysis on solving unit vector tridiagonal systems. Our experiments on Tianhe-1A show ideal weak scalability on up to 128 nodes when solving a tridiagonal system with a size of 1920M in the largest run and good strong scalability (70%) from 32 nodes to 256 nodes when solving a tridiagonal system with a size of 480M. Furthermore, the adaptive task partition across the CPU and GPU can get over 10% performance improvement in the strong scaling test with 256 nodes. In one computing node of Tianhe-1A, our GPU-only code can outperform the CUSPARSE version (non-pivoting tridiagonal solver) by 30%, and our hybrid code is about 6.7 times faster than the Intel SPIKE multi-process version for tridiagonal systems having a size of 3M, 5M, and 15M.

Keywords

parallel algorithms; parallel machines; CPU-GPU hybrid systems; SPIKE algorithm; Tianhe-1A; heterogeneous computing node; heterogenous supercomputers; hierarchical algorithm framework; hierarchical tridiagonal system solver; parallel algorithms; tiled-PCR-pThomas algorithm; unit vector tridiagonal systems; Clustering algorithms; Equations; Graphics processing units; Mathematical model; Matrix decomposition; Partitioning algorithms; Vectors; Tridiagonal system; Heterogeneous supercomputer; GPU; Tianhe-1A;

fLanguage

English

Publisher

ieee

Conference_Titel

Latest Advances in Scalable Algorithms for Large-Scale Systems (ScalA), 2014 5th Workshop on

Conference_Location

New Orleans, LA

Type

conf

DOI

10.1109/ScalA.2014.12

Filename

7016736