مرکز منطقه ای اطلاع رساني علوم و فناوري - A Sparse Direct Solver for Distributed Memory Xeon Phi-Accelerated Systems

DocumentCode :

723684

Title :

A Sparse Direct Solver for Distributed Memory Xeon Phi-Accelerated Systems

Author :

Sao, Piyush ; Xing Liu ; Vuduc, Richard ; Xiaoye Li

Author_Institution :

Georgia Inst. of Technol., Atlanta, GA, USA

fYear :

2015

fDate :

25-29 May 2015

Firstpage :

Lastpage :

Abstract :

This paper presents the first sparse direct solver for distributed memory systems comprising hybrid multicourse CPU and Intel Xeon Pico-processors. It builds on the algorithmic approach of SuperLU_DIST, which is right-looking and statically pivoted. Our contribution is a novel algorithm, called the HALO. The name is shorthand for highly asynchronous lazy offload, it refers tithe way the algorithm combines highly aggressive use of asynchrony with accelerated offload, lazy updates, and data shadowing (a la halo or ghost zones), all of which serve to hide and reduce communication, whether to local memory, across the network, or over PCIe. We further augment HALO with a model-driven autotuning heuristicthat chooses the intra-node division of labor among CPU and Xeon Pico-processor components. When integrated into SuperLU_DIST and evaluated on a variety of realistic test problems in both single-node and multi-node configurations, the resulting implementation achieves speedups of unto 2.5× over an already efficient multicourse CPU implementation, and achieves up to 83% of a machine-specific upper-bound that we haveestimated. Our analysis quantifies how well our implementation performs and allows us to speculate on the potential speedups that might come from variety of future improvements to the algorithm and system.

Keywords :

coprocessors; distributed memory systems; storage management; HALO algorithm; Intel Xeon Pico-processors; PCIe; SuperLU_DIST; accelerated offload; algorithmic approach; communication reduction; data shadowing; distributed memory Xeon Phi-accelerated systems; ghost zone; highly aggressive asynchrony; highly asynchronous lazy offload algorithm; hybrid multicourse CPU; intranode division; lazy update; local memory; model-driven autotuning heuristic; multinode configuration; single-node configuration; sparse direct solver; Acceleration; Graphics processing units; Memory management; Microwave integrated circuits; Multicore processing; Parallel processing; Sparse matrices; Communication-avoiding algorithm; GPU; Heterogeneous computing; MPI; OpenMP; Sparse Direct Solver; Xeon-Phi acceleration;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Parallel and Distributed Processing Symposium (IPDPS), 2015 IEEE International

Conference_Location :

Hyderabad

ISSN :

1530-2075

Type :

conf

DOI :

10.1109/IPDPS.2015.104

Filename :

7161497

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=723684