A G-Line-Based Network for Fast and Efficient Barrier Synchronization in Many-Core CMPs

Author

Abellán, José L. ; Fernández, Juan ; Acacio, Manuel E.

Author_Institution

Dept. de Ing. y Tecnol. de Comput., Univ. de Murcia, Murcia, Spain

fYear

2010

fDate

13-16 Sept. 2010

Firstpage

267

Lastpage

276

Abstract

Barrier synchronization in shared memory parallel machines has been widely implemented through busy-waiting on shared variables. However, typical implementations of barrier synchronization tend to produce hot-spots in terms of memory and network contention, thus creating performance bottlenecks that become markedly more pronounced as the number of cores or processors increases. To overcome such limitations, we present a novel hardware-based barrier mechanism in the context of many-core CMPs. Our proposal is based on global interconnection lines (G-lines) and the S-CSMA technique, which have been recently used to enhance a flow control mechanism (EVC) in the context of networks-on-chip. Based on this technology, we have designed a simple and scalable G-line-based network that operates independently of the main data network, and that is aimed at carrying out barrier synchronizations efficiently. In the ideal case, our design takes only 4 cycles to perform a barrier synchronization once all cores or threads have arrived at the barrier. As a proof of concept, we examine the benefits of our proposal by comparing it with one of the best software approaches (a binary combining-tree barrier). To do so, we run several kernels and scientific applications on top of the Sim-PowerCMP performance simulator that models a 32-core CMP with a 2D-mesh network configuration. Our proposal entails average reductions in terms of execution time of 68% and 21% for kernels and scientific applications, respectively. Additionally, network traffic is also lowered by 74% and 18%, respectively.

Keywords

network-on-chip; parallel machines; shared memory systems; synchronisation; 2D-mesh network configuration; EVC; G-line-based network; S-CSMA technique; Sim-PowerCMP performance simulator; barrier synchronization; binary combining-tree barrier; data network; flow control mechanism; global interconnection lines; hardware-based barrier mechanism; many-core CMP; network contention; networks-on-chip; shared memory parallel machines; software approach; Hardware; Multiprocessor interconnection; Program processors; Proposals; Radiation detectors; Registers; Synchronization;

fLanguage

English

Publisher

ieee

Conference_Titel

Parallel Processing (ICPP), 2010 39th International Conference on

Conference_Location

San Diego, CA

ISSN

0190-3918

Print_ISBN

978-1-4244-7913-9

Electronic_ISBN

0190-3918

Type

conf

DOI

10.1109/ICPP.2010.34

Filename

5599171