• DocumentCode
    3588956
  • Title

    Analyzing Put/Get APIs for Thread-Collaborative Processors

  • Author

    Klenk, Benjamin ; Oden, Lena ; Froening, Holger

  • Author_Institution
    Inst. of Comput. Eng., Univ. of Heidelberg, Heidelberg, Germany
  • fYear
    2014
  • Firstpage
    411
  • Lastpage
    418
  • Abstract
    In High-Performance Computing (HPC), GPU-based accelerators are pervasive for two reasons: first, GPUs provide a much higher raw computational power than traditional CPUs. Second, power consumption increases sub-linearly with the performance increase, making GPUs much more energy-efficient in terms of GFLOPS/Watt than CPUs. Although these advantages are limited to a selected set of workloads, most HPC applications can benefit a lot from GPUs. The top 11 entries of the current Green500 list (November 2013) are all GPU-accelerated systems, which supports the previous statements. For system architects the use of GPUs is challenging though, as their architecture is based on thread-collaborative execution and differs significantly from CPUs, which are mainly optimized for single-thread performance. The interfaces to other devices in a system, in particular the network device, are still solely optimized for CPUs. This makes GPU-controlled IO a challenge, although it is desirable for savings in terms of energy and time. This is especially true for network devices, which are a key component in HPC systems. In previous work we have shown that GPUs can directly source and sink network traffic for Infiniband devices without any involvement of the host CPUs, but this approach does not provide any performance benefits. Here we explore another API for Put/Get operations that can overcome some limitations. In particular, we provide a detailed reasoning about the issues that prevent performance advantages when directly controlling IO from the GPU domain.
  • Keywords
    application program interfaces; graphics processing units; multiprocessing systems; parallel processing; GPU-controlled IO; HPC systems; high-performance computing; put/get API; thread-collaborative processors; Bandwidth; Data transfer; Graphics processing units; Instruction sets; Kernel; Performance evaluation; Programming;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Parallel Processing Workshops (ICCPW), 2014 43rd International Conference on
  • ISSN
    1530-2016
  • Type

    conf

  • DOI
    10.1109/ICPPW.2014.61
  • Filename
    7103479