DocumentCode
668156
Title
HPC runtime support for fast and power efficient locking and synchronization
Author
Akkan, Hakan ; Lang, Michael ; Ionkov, Latchesar
Author_Institution
New Mexico Consortium, Los Alamos, NM, USA
fYear
2013
fDate
23-27 Sept. 2013
Firstpage
1
Lastpage
7
Abstract
As compute nodes increase in parallelism, existing intra-node locking and synchronization primitives need to be scalable, fast, and power efficient. Most parallel runtime systems try to find a balance between these properties during synchronization by fine-tuned spin-waiting and processor yielding to the OS. Unfortunately, the code path followed by the OS to put the processor into a lower power state for idling almost always includes the interrupt processing path. This introduces an unnecessary overhead for both the waiting tasks and the task waking them up. In this work we investigate a pair of x86 specific instructions, MONITOR and MWAIT, that can be used to build these primitives with the desired performance and power efficiency properties. This pair of instructions allow a processor to quickly pause execution until another one wakes it up with single memory store avoiding the overhead of switching to the idle thread of the OS for the waiting task, and sending IPIs for the waking task. We implement a locking primitive using these instructions and evaluate its effectiveness in OpenMP on low to high scales. In these tests we have seen very good scaling and performance improvements of up to 23x and 6x power reduction at 64 cores. With these results as a motivation we propose that other high-core count processors include these type of instructions and make them available to user-space applications.
Keywords
microprocessor chips; parallel processing; power aware computing; synchronisation; HPC runtime support; IPI; MONITOR; MWAIT; OS; compute nodes; fine-tuned spin-waiting; high-core count processors; interrupt processing path; intranode locking; parallel runtime systems; power efficient locking; power efficient synchronization; power reduction; processor yielding; synchronization primitives; user-space applications; waiting tasks; x86 specific instructions; Instruction sets; Kernel; Linux; Monitoring; Radiation detectors; Standards; Synchronization;
fLanguage
English
Publisher
ieee
Conference_Titel
Cluster Computing (CLUSTER), 2013 IEEE International Conference on
Conference_Location
Indianapolis, IN
Type
conf
DOI
10.1109/CLUSTER.2013.6702659
Filename
6702659
Link To Document