• DocumentCode
    2535756
  • Title

    A G-Line-Based Network for Fast and Efficient Barrier Synchronization in Many-Core CMPs

  • Author

    Abellán, José L. ; Fernández, Juan ; Acacio, Manuel E.

  • Author_Institution
    Dept. de Ing. y Tecnol. de Comput., Univ. de Murcia, Murcia, Spain
  • fYear
    2010
  • fDate
    13-16 Sept. 2010
  • Firstpage
    267
  • Lastpage
    276
  • Abstract
    Barrier synchronization in shared memory parallel machines has been widely implemented through busy-waiting on shared variables. However, typical implementations of barrier synchronization tend to produce hot-spots in terms of memory and network contention, thus creating performance bottlenecks that become markedly more pronounced as the number of cores or processors increases. To overcome such limitations, we present a novel hardware-based barrier mechanism in the context of many-core CMPs. Our proposal is based on global interconnection lines (G-lines) and the S-CSMA technique, which have been recently used to enhance a flow control mechanism (EVC) in the context of networks-on-chip. Based on this technology, we have designed a simple and scalable G-line-based network that operates independently of the main data network, and that is aimed at carrying out barrier synchronizations efficiently. In the ideal case, our design takes only 4 cycles to perform a barrier synchronization once all cores or threads have arrived at the barrier. As a proof of concept, we examine the benefits of our proposal by comparing it with one of the best software approaches (a binary combining-tree barrier). To do so, we run several kernels and scientific applications on top of the Sim-PowerCMP performance simulator that models a 32-core CMP with a 2D-mesh network configuration. Our proposal entails average reductions in terms of execution time of 68% and 21% for kernels and scientific applications, respectively. Additionally, network traffic is also lowered by 74% and 18%, respectively.
  • Keywords
    network-on-chip; parallel machines; shared memory systems; synchronisation; 2D-mesh network configuration; EVC; G-line-based network; S-CSMA technique; Sim-PowerCMP performance simulator; barrier synchronization; binary combining-tree barrier; data network; flow control mechanism; global interconnection lines; hardware-based barrier mechanism; many-core CMP; network contention; networks-on-chip; shared memory parallel machines; software approach; Hardware; Multiprocessor interconnection; Program processors; Proposals; Radiation detectors; Registers; Synchronization;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Parallel Processing (ICPP), 2010 39th International Conference on
  • Conference_Location
    San Diego, CA
  • ISSN
    0190-3918
  • Print_ISBN
    978-1-4244-7913-9
  • Electronic_ISBN
    0190-3918
  • Type

    conf

  • DOI
    10.1109/ICPP.2010.34
  • Filename
    5599171