مرکز منطقه ای اطلاع رساني علوم و فناوري - Interconnection Network for Tightly Coupled Accelerators Architecture

DocumentCode :

3471603

Title :

Interconnection Network for Tightly Coupled Accelerators Architecture

Author :

Hanawa, T. ; Kodama, Yuetsu ; Boku, Taisuke ; Sato, Mitsuhisa

Author_Institution :

Center for Comput. Sci., Univ. of Tsukuba, Tsukuba, Japan

fYear :

2013

fDate :

21-23 Aug. 2013

Firstpage :

Lastpage :

Abstract :

In recent years, heterogeneous clusters using accelerators have entered widespread use in high-performance computing systems. In such clusters, inter-node communication between accelerators normally requires several memory copies via CPU memory, which results in communication latency that causes severe performance degradation. To address this problem, we propose Tightly Coupled Accelerators (TCA) architecture, which is capable of reducing the communication latency between accelerators over different nodes. In the TCA architecture, PCI Express (PCIe)packets are used for direct inter-node communication between accelerators. In addition, we designed a communication chip that we have named PCI Express Adaptive Communication Hub Version 2 (PEACH2) to realize our proposed TCA architecture. In this paper, we introduce the design and implementation of the PEACH2 chip using a field programmable gate array (FPGA), and present a PEACH2 board designed for use as a PCIe extension board. The results of evaluations using ping-pong programs on an eight node TCA cluster demonstrate that the PEACH2 chip achieves 95% of the theoretical peak performance and a latency of 0.96 μsec.

Keywords :

field programmable gate arrays; graphics processing units; multiprocessor interconnection networks; network-on-chip; parallel processing; performance evaluation; peripheral interfaces; CPU memory; FPGA; PCI express adaptive communication hub version 2; PCI express packets; PCIe extension board; PCIe packets; PEACH2 board; PEACH2 chip; TCA architecture; TCA cluster; communication chip; communication latency; field programmable gate array; heterogeneous clusters; high-performance computing systems; interconnection network; internode communication; memory copies; performance degradation; tightly coupled accelerator architecture; Bandwidth; Computer architecture; Field programmable gate arrays; Graphics processing units; Performance evaluation; Ports (Computers); Switches; Accelerator computing; GPU cluster; Interconnect for accelerators; PCI Express; Remote DMA;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

High-Performance Interconnects (HOTI), 2013 IEEE 21st Annual Symposium on

Conference_Location :

San Jose, CA

Type :

conf

DOI :

10.1109/HOTI.2013.15

Filename :

6627740

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=3471603