• DocumentCode
    668156
  • Title

    HPC runtime support for fast and power efficient locking and synchronization

  • Author

    Akkan, Hakan ; Lang, Michael ; Ionkov, Latchesar

  • Author_Institution
    New Mexico Consortium, Los Alamos, NM, USA
  • fYear
    2013
  • fDate
    23-27 Sept. 2013
  • Firstpage
    1
  • Lastpage
    7
  • Abstract
    As compute nodes increase in parallelism, existing intra-node locking and synchronization primitives need to be scalable, fast, and power efficient. Most parallel runtime systems try to find a balance between these properties during synchronization by fine-tuned spin-waiting and processor yielding to the OS. Unfortunately, the code path followed by the OS to put the processor into a lower power state for idling almost always includes the interrupt processing path. This introduces an unnecessary overhead for both the waiting tasks and the task waking them up. In this work we investigate a pair of x86 specific instructions, MONITOR and MWAIT, that can be used to build these primitives with the desired performance and power efficiency properties. This pair of instructions allow a processor to quickly pause execution until another one wakes it up with single memory store avoiding the overhead of switching to the idle thread of the OS for the waiting task, and sending IPIs for the waking task. We implement a locking primitive using these instructions and evaluate its effectiveness in OpenMP on low to high scales. In these tests we have seen very good scaling and performance improvements of up to 23x and 6x power reduction at 64 cores. With these results as a motivation we propose that other high-core count processors include these type of instructions and make them available to user-space applications.
  • Keywords
    microprocessor chips; parallel processing; power aware computing; synchronisation; HPC runtime support; IPI; MONITOR; MWAIT; OS; compute nodes; fine-tuned spin-waiting; high-core count processors; interrupt processing path; intranode locking; parallel runtime systems; power efficient locking; power efficient synchronization; power reduction; processor yielding; synchronization primitives; user-space applications; waiting tasks; x86 specific instructions; Instruction sets; Kernel; Linux; Monitoring; Radiation detectors; Standards; Synchronization;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Cluster Computing (CLUSTER), 2013 IEEE International Conference on
  • Conference_Location
    Indianapolis, IN
  • Type

    conf

  • DOI
    10.1109/CLUSTER.2013.6702659
  • Filename
    6702659