DocumentCode :
3471603
Title :
Interconnection Network for Tightly Coupled Accelerators Architecture
Author :
Hanawa, T. ; Kodama, Yuetsu ; Boku, Taisuke ; Sato, Mitsuhisa
Author_Institution :
Center for Comput. Sci., Univ. of Tsukuba, Tsukuba, Japan
fYear :
2013
fDate :
21-23 Aug. 2013
Firstpage :
79
Lastpage :
82
Abstract :
In recent years, heterogeneous clusters using accelerators have entered widespread use in high-performance computing systems. In such clusters, inter-node communication between accelerators normally requires several memory copies via CPU memory, which results in communication latency that causes severe performance degradation. To address this problem, we propose Tightly Coupled Accelerators (TCA) architecture, which is capable of reducing the communication latency between accelerators over different nodes. In the TCA architecture, PCI Express (PCIe)packets are used for direct inter-node communication between accelerators. In addition, we designed a communication chip that we have named PCI Express Adaptive Communication Hub Version 2 (PEACH2) to realize our proposed TCA architecture. In this paper, we introduce the design and implementation of the PEACH2 chip using a field programmable gate array (FPGA), and present a PEACH2 board designed for use as a PCIe extension board. The results of evaluations using ping-pong programs on an eight node TCA cluster demonstrate that the PEACH2 chip achieves 95% of the theoretical peak performance and a latency of 0.96 μsec.
Keywords :
field programmable gate arrays; graphics processing units; multiprocessor interconnection networks; network-on-chip; parallel processing; performance evaluation; peripheral interfaces; CPU memory; FPGA; PCI express adaptive communication hub version 2; PCI express packets; PCIe extension board; PCIe packets; PEACH2 board; PEACH2 chip; TCA architecture; TCA cluster; communication chip; communication latency; field programmable gate array; heterogeneous clusters; high-performance computing systems; interconnection network; internode communication; memory copies; performance degradation; tightly coupled accelerator architecture; Bandwidth; Computer architecture; Field programmable gate arrays; Graphics processing units; Performance evaluation; Ports (Computers); Switches; Accelerator computing; GPU cluster; Interconnect for accelerators; PCI Express; Remote DMA;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
High-Performance Interconnects (HOTI), 2013 IEEE 21st Annual Symposium on
Conference_Location :
San Jose, CA
Type :
conf
DOI :
10.1109/HOTI.2013.15
Filename :
6627740
Link To Document :
بازگشت