DocumentCode
3664231
Title
Implementing Cross-Device Atomics in Heterogeneous Processors
Author
Meghana Gupta;Dibyendu Das;Prakash Raghavendra;Tony Tye;Leonid Lobachev;Amit Agarwal;Ravish Hegde
Author_Institution
Adv. Micro Devices, Inc. (AMD), USA
fYear
2015
fDate
5/1/2015 12:00:00 AM
Firstpage
659
Lastpage
668
Abstract
In this paper we describe how to support atomics across multiple devices in heterogeneous processors. Specifically, this paper provides an overview of how OpenCL 2.0 and Heterogeneous System Architecture (HSA) atomics are supported on integrated CPU-GPU processors called Accelerated Processing Units (APUs). Recently, the C11 and C++11 standards have introduced atomics and an associated memory model for supporting scalable parallel programming with memory consistency semantics. OpenCL 2.0 revision has extended these atomics for multiple devices each one of which can be a CPU or a GPU. The HSA Foundation in the HSA intermediate language (HSAIL) standard has also included support for various atomic operations that span multiple devices. All of these paradigms enable parallel threads running simultaneously on the CPU and GPU cores to synchronize using atomics that were not possible earlier. In APUs, the CPU and GPU cores are on the same die and can access a unified memory. Hence, such a platform provides an excellent opportunity for showcasing the power of OpenCL 2.0/HSA atomics across devices (henceforth referred to as cross-device atomics). In this work we show how we have added capabilities in our LLVM-based OpenCL compiler and a JIT-like finalizer to support cross-device atomics for APUs. Also, by supporting the new HSAIL atomic virtual operations in our finalizer, we have enabled the capability whereby other high-level languages which translate to HSAIL can support cross-device atomics as part of their evolving language standard. Our compiler is one of the first to support such cross-device atomics.
Keywords
"Instruction sets","Graphics processing units","Synchronization","Standards","Load modeling","Central Processing Unit"
Publisher
ieee
Conference_Titel
Parallel and Distributed Processing Symposium Workshop (IPDPSW), 2015 IEEE International
Type
conf
DOI
10.1109/IPDPSW.2015.40
Filename
7284373
Link To Document