• DocumentCode
    3664231
  • Title

    Implementing Cross-Device Atomics in Heterogeneous Processors

  • Author

    Meghana Gupta;Dibyendu Das;Prakash Raghavendra;Tony Tye;Leonid Lobachev;Amit Agarwal;Ravish Hegde

  • Author_Institution
    Adv. Micro Devices, Inc. (AMD), USA
  • fYear
    2015
  • fDate
    5/1/2015 12:00:00 AM
  • Firstpage
    659
  • Lastpage
    668
  • Abstract
    In this paper we describe how to support atomics across multiple devices in heterogeneous processors. Specifically, this paper provides an overview of how OpenCL 2.0 and Heterogeneous System Architecture (HSA) atomics are supported on integrated CPU-GPU processors called Accelerated Processing Units (APUs). Recently, the C11 and C++11 standards have introduced atomics and an associated memory model for supporting scalable parallel programming with memory consistency semantics. OpenCL 2.0 revision has extended these atomics for multiple devices each one of which can be a CPU or a GPU. The HSA Foundation in the HSA intermediate language (HSAIL) standard has also included support for various atomic operations that span multiple devices. All of these paradigms enable parallel threads running simultaneously on the CPU and GPU cores to synchronize using atomics that were not possible earlier. In APUs, the CPU and GPU cores are on the same die and can access a unified memory. Hence, such a platform provides an excellent opportunity for showcasing the power of OpenCL 2.0/HSA atomics across devices (henceforth referred to as cross-device atomics). In this work we show how we have added capabilities in our LLVM-based OpenCL compiler and a JIT-like finalizer to support cross-device atomics for APUs. Also, by supporting the new HSAIL atomic virtual operations in our finalizer, we have enabled the capability whereby other high-level languages which translate to HSAIL can support cross-device atomics as part of their evolving language standard. Our compiler is one of the first to support such cross-device atomics.
  • Keywords
    "Instruction sets","Graphics processing units","Synchronization","Standards","Load modeling","Central Processing Unit"
  • Publisher
    ieee
  • Conference_Titel
    Parallel and Distributed Processing Symposium Workshop (IPDPSW), 2015 IEEE International
  • Type

    conf

  • DOI
    10.1109/IPDPSW.2015.40
  • Filename
    7284373