مرکز منطقه ای اطلاع رساني علوم و فناوري - Implementation of CG Method on GPU Cluster with Proprietary Interconnect TCA for GPU Direct Communication

DocumentCode :

3664228

Title :

Implementation of CG Method on GPU Cluster with Proprietary Interconnect TCA for GPU Direct Communication

Author :

Kazuya Matsumoto;Toshihiro Hanawa;Yuetsu Kodama;Hisafumi Fujii;Taisuke Boku

Author_Institution :

Center for Comput. Sci., Univ. of Tsukuba, Tsukuba, Japan

fYear :

2015

fDate :

5/1/2015 12:00:00 AM

Firstpage :

647

Lastpage :

655

Abstract :

We have been developing a proprietary interconnect technology called Tightly Coupled Accelerators (TCA) architecture to improve communication latency and bandwidth between compute nodes on a GPU cluster. This paper describes the Conjugate Gradient (CG) method implementation using TCA and results of performance evaluation on the HA-PACS/TCA system, which is a proof-of-concept GPU cluster based on the TCA concept. The implementation uses the TCA for all gather and all reduce collective communications. Comparison results between the implementation using TCA and an implementation using MPI show that the TCA contributes to reduce latency for relatively small data gathering on the all gather and demonstrate about twice faster speed on the all reduce. As a result, the CG method implementation using TCA outperforms the implementation using MPI for sparse matrices whose matrix size is thousands to tens of thousands.

Keywords :

"Graphics processing units","Sparse matrices","Bandwidth","Performance evaluation","Ports (Computers)","Peer-to-peer computing","Computer architecture"

Publisher :

ieee

Conference_Titel :

Parallel and Distributed Processing Symposium Workshop (IPDPSW), 2015 IEEE International

Type :

conf

DOI :

10.1109/IPDPSW.2015.102

Filename :

7284370

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=3664228