Algorithm Flattening: Complete branch elimination for GPU requires a paradigm shift from CPU thinking

Author

Lucas Vespa;Alexander Bauman;Jenny Wells

Author_Institution

University of Illinois Springfield, 62703, United States

fYear

2015

Firstpage

Lastpage

Abstract

Graphics processing units (GPUs) have inadvertently become supercomputers in and of themselves, to the benefit of applications outside of graphics. Acceleration of multiple orders of magnitude has been achieved in scientific computing, co-processing and the like. However, the Single Instruction Multiple Data (SIMD) design of GPUs is extremely sensitive to thread divergence. So much so that performance improvement from GPUs is all but eviscerated by thread divergence for many applications. This problem has driven general purpose GPU computing in the direction of finding “appropriate” applications to accelerate, rather than accelerating applications with a need for performance improvements. Thread divergence is generally caused by branches. Previous research has addressed the issue of reducing branches, but none of this work aims to entirely eliminate branches, because the methods required for complete branch elimination are a drastic de-optimization for CPU. We present Algorithm Flattening (AF), a de-optimization for CPU which completely removes all branches, and results in a significant optimization for GPU accelerated applications. AF eliminates thread divergence, substantially decreases execution time, allows for the implementation of algorithms on GPU which previously do not fully utilize GPU capability and generates deterministic performance. AF removes branches, replacing them with a reduced equation, and achieves a substantial speedup of already GPU accelerated algorithms and applications. We believe that AF will have a significant impact on high performance computing as it is a long needed solution that allows unprecedented use of GPUs for general purpose applications.

Keywords

"Graphics processing units","Optimization","Acceleration","Instruction sets","Mathematical model","Kernel"

Publisher

ieee

Conference_Titel

High Performance Extreme Computing Conference (HPEC), 2015 IEEE

Type

conf

DOI

10.1109/HPEC.2015.7322477

Filename

7322477

Link To Document

https://search.isc.ac/dl/search/defaultta.aspx?DTC=49&DC=3687129