• DocumentCode
    3687129
  • Title

    Algorithm Flattening: Complete branch elimination for GPU requires a paradigm shift from CPU thinking

  • Author

    Lucas Vespa;Alexander Bauman;Jenny Wells

  • Author_Institution
    University of Illinois Springfield, 62703, United States
  • fYear
    2015
  • Firstpage
    1
  • Lastpage
    6
  • Abstract
    Graphics processing units (GPUs) have inadvertently become supercomputers in and of themselves, to the benefit of applications outside of graphics. Acceleration of multiple orders of magnitude has been achieved in scientific computing, co-processing and the like. However, the Single Instruction Multiple Data (SIMD) design of GPUs is extremely sensitive to thread divergence. So much so that performance improvement from GPUs is all but eviscerated by thread divergence for many applications. This problem has driven general purpose GPU computing in the direction of finding “appropriate” applications to accelerate, rather than accelerating applications with a need for performance improvements. Thread divergence is generally caused by branches. Previous research has addressed the issue of reducing branches, but none of this work aims to entirely eliminate branches, because the methods required for complete branch elimination are a drastic de-optimization for CPU. We present Algorithm Flattening (AF), a de-optimization for CPU which completely removes all branches, and results in a significant optimization for GPU accelerated applications. AF eliminates thread divergence, substantially decreases execution time, allows for the implementation of algorithms on GPU which previously do not fully utilize GPU capability and generates deterministic performance. AF removes branches, replacing them with a reduced equation, and achieves a substantial speedup of already GPU accelerated algorithms and applications. We believe that AF will have a significant impact on high performance computing as it is a long needed solution that allows unprecedented use of GPUs for general purpose applications.
  • Keywords
    "Graphics processing units","Optimization","Acceleration","Instruction sets","Mathematical model","Kernel"
  • Publisher
    ieee
  • Conference_Titel
    High Performance Extreme Computing Conference (HPEC), 2015 IEEE
  • Type

    conf

  • DOI
    10.1109/HPEC.2015.7322477
  • Filename
    7322477