Title :
Performance Characterization and Optimization of Atomic Operations on AMD GPUs
Author :
Elteir, Marwa ; Lin, Heshan ; Feng, Wu-chun
Author_Institution :
Dept. of Comput. Sci., Virginia Tech, Blacksburg, VA, USA
Abstract :
Atomic operations are important building blocks in supporting general-purpose computing on graphics processing units (GPUs). For instance, they can be used to coordinate execution between concurrent threads, and in turn, assist in constructing complex data structures such as hash tables or implementing GPU-wide barrier synchronization. While the performance of atomic operations has improved substantially on the latest NVIDIA Fermi-based GPUs, system-provided atomic operations still incur significant performance penalties on AMD GPUs. A memory-bound kernel on an AMD GPU, for example, can suffer severe performance degradation when including an atomic operation, even if the atomic operation is never executed. In this paper, we first quantify the performance impact of atomic instructions to application kernels on AMD GPUs. We then propose a novel software-based implementation of atomic operations that can significantly improve the overall kernel performance. We evaluate its performance against the system-provided atomic using two micro-benchmarks and four real applications. The results show that using our software based atomic operations on an AMD GPU can speedup an application kernel by 67-fold over the same application kernel but with the (default) system-provided atomic operations.
Keywords :
coprocessors; data structures; synchronisation; AMD GPU; GPU-wide barrier synchronization; NVIDIA Fermi-based GPU; application kernels; atomic instructions; complex data structure construction; concurrent threads; four real applications; general-purpose computing; graphics processing units; hash tables; kernel performance; memory-bound kernel; micro-benchmarks; optimization; performance characterization; performance degradation; software based atomic operations; software-based implementation; system-provided atomic operations; Arrays; Graphics processing unit; High definition video; Instruction sets; Kernel; Synchronization; GPGPU; GPU; MapReduce; atomic operations; heterogeneous computing;
Conference_Titel :
Cluster Computing (CLUSTER), 2011 IEEE International Conference on
Conference_Location :
Austin, TX
Print_ISBN :
978-1-4577-1355-2
Electronic_ISBN :
978-0-7695-4516-5
DOI :
10.1109/CLUSTER.2011.34