Implementing Cross-Device Atomics in Heterogeneous Processors

Author

Meghana Gupta;Dibyendu Das;Prakash Raghavendra;Tony Tye;Leonid Lobachev;Amit Agarwal;Ravish Hegde

Author_Institution

Adv. Micro Devices, Inc. (AMD), USA

fYear

2015

fDate

5/1/2015 12:00:00 AM

Firstpage

659

Lastpage

668

Abstract

In this paper we describe how to support atomics across multiple devices in heterogeneous processors. Specifically, this paper provides an overview of how OpenCL 2.0 and Heterogeneous System Architecture (HSA) atomics are supported on integrated CPU-GPU processors called Accelerated Processing Units (APUs). Recently, the C11 and C++11 standards have introduced atomics and an associated memory model for supporting scalable parallel programming with memory consistency semantics. OpenCL 2.0 revision has extended these atomics for multiple devices each one of which can be a CPU or a GPU. The HSA Foundation in the HSA intermediate language (HSAIL) standard has also included support for various atomic operations that span multiple devices. All of these paradigms enable parallel threads running simultaneously on the CPU and GPU cores to synchronize using atomics that were not possible earlier. In APUs, the CPU and GPU cores are on the same die and can access a unified memory. Hence, such a platform provides an excellent opportunity for showcasing the power of OpenCL 2.0/HSA atomics across devices (henceforth referred to as cross-device atomics). In this work we show how we have added capabilities in our LLVM-based OpenCL compiler and a JIT-like finalizer to support cross-device atomics for APUs. Also, by supporting the new HSAIL atomic virtual operations in our finalizer, we have enabled the capability whereby other high-level languages which translate to HSAIL can support cross-device atomics as part of their evolving language standard. Our compiler is one of the first to support such cross-device atomics.

Keywords

"Instruction sets","Graphics processing units","Synchronization","Standards","Load modeling","Central Processing Unit"

Publisher

ieee

Conference_Titel

Parallel and Distributed Processing Symposium Workshop (IPDPSW), 2015 IEEE International

Type

conf

DOI

10.1109/IPDPSW.2015.40

Filename

7284373