Analyzing Put/Get APIs for Thread-Collaborative Processors

Author

Klenk, Benjamin ; Oden, Lena ; Froening, Holger

Author_Institution

Inst. of Comput. Eng., Univ. of Heidelberg, Heidelberg, Germany

fYear

2014

Firstpage

411

Lastpage

418

Abstract

In High-Performance Computing (HPC), GPU-based accelerators are pervasive for two reasons: first, GPUs provide a much higher raw computational power than traditional CPUs. Second, power consumption increases sub-linearly with the performance increase, making GPUs much more energy-efficient in terms of GFLOPS/Watt than CPUs. Although these advantages are limited to a selected set of workloads, most HPC applications can benefit a lot from GPUs. The top 11 entries of the current Green500 list (November 2013) are all GPU-accelerated systems, which supports the previous statements. For system architects the use of GPUs is challenging though, as their architecture is based on thread-collaborative execution and differs significantly from CPUs, which are mainly optimized for single-thread performance. The interfaces to other devices in a system, in particular the network device, are still solely optimized for CPUs. This makes GPU-controlled IO a challenge, although it is desirable for savings in terms of energy and time. This is especially true for network devices, which are a key component in HPC systems. In previous work we have shown that GPUs can directly source and sink network traffic for Infiniband devices without any involvement of the host CPUs, but this approach does not provide any performance benefits. Here we explore another API for Put/Get operations that can overcome some limitations. In particular, we provide a detailed reasoning about the issues that prevent performance advantages when directly controlling IO from the GPU domain.

Keywords

application program interfaces; graphics processing units; multiprocessing systems; parallel processing; GPU-controlled IO; HPC systems; high-performance computing; put/get API; thread-collaborative processors; Bandwidth; Data transfer; Graphics processing units; Instruction sets; Kernel; Performance evaluation; Programming;

fLanguage

English

Publisher

ieee

Conference_Titel

Parallel Processing Workshops (ICCPW), 2014 43rd International Conference on

ISSN

1530-2016

Type

conf

DOI

10.1109/ICPPW.2014.61

Filename

7103479