DocumentCode
2535756
Title
A G-Line-Based Network for Fast and Efficient Barrier Synchronization in Many-Core CMPs
Author
Abellán, José L. ; Fernández, Juan ; Acacio, Manuel E.
Author_Institution
Dept. de Ing. y Tecnol. de Comput., Univ. de Murcia, Murcia, Spain
fYear
2010
fDate
13-16 Sept. 2010
Firstpage
267
Lastpage
276
Abstract
Barrier synchronization in shared memory parallel machines has been widely implemented through busy-waiting on shared variables. However, typical implementations of barrier synchronization tend to produce hot-spots in terms of memory and network contention, thus creating performance bottlenecks that become markedly more pronounced as the number of cores or processors increases. To overcome such limitations, we present a novel hardware-based barrier mechanism in the context of many-core CMPs. Our proposal is based on global interconnection lines (G-lines) and the S-CSMA technique, which have been recently used to enhance a flow control mechanism (EVC) in the context of networks-on-chip. Based on this technology, we have designed a simple and scalable G-line-based network that operates independently of the main data network, and that is aimed at carrying out barrier synchronizations efficiently. In the ideal case, our design takes only 4 cycles to perform a barrier synchronization once all cores or threads have arrived at the barrier. As a proof of concept, we examine the benefits of our proposal by comparing it with one of the best software approaches (a binary combining-tree barrier). To do so, we run several kernels and scientific applications on top of the Sim-PowerCMP performance simulator that models a 32-core CMP with a 2D-mesh network configuration. Our proposal entails average reductions in terms of execution time of 68% and 21% for kernels and scientific applications, respectively. Additionally, network traffic is also lowered by 74% and 18%, respectively.
Keywords
network-on-chip; parallel machines; shared memory systems; synchronisation; 2D-mesh network configuration; EVC; G-line-based network; S-CSMA technique; Sim-PowerCMP performance simulator; barrier synchronization; binary combining-tree barrier; data network; flow control mechanism; global interconnection lines; hardware-based barrier mechanism; many-core CMP; network contention; networks-on-chip; shared memory parallel machines; software approach; Hardware; Multiprocessor interconnection; Program processors; Proposals; Radiation detectors; Registers; Synchronization;
fLanguage
English
Publisher
ieee
Conference_Titel
Parallel Processing (ICPP), 2010 39th International Conference on
Conference_Location
San Diego, CA
ISSN
0190-3918
Print_ISBN
978-1-4244-7913-9
Electronic_ISBN
0190-3918
Type
conf
DOI
10.1109/ICPP.2010.34
Filename
5599171
Link To Document