Title :
Convergence and scalarization for data-parallel architectures
Author :
Yunsup Lee ; Krashinsky, Ronny ; Grover, Vinod ; Keckler, Stephen W. ; Asanovic, Krste
Author_Institution :
Univ. of California at Berkeley, Berkeley, CA, USA
Abstract :
Modern throughput processors such as GPUs achieve high performance and efficiency by exploiting data parallelism in application kernels expressed as threaded code. One draw-back of this approach compared to conventional vector architectures is redundant execution of instructions that are common across multiple threads, resulting in energy inefficiency due to excess instruction dispatch, register file accesses, and memory operations. This paper proposes to alleviate these overheads while retaining the threaded programming model by automatically detecting the scalar operations and factoring them out of the parallel code. We have developed a scalarizing compiler that employs convergence and variance analyses to statically identify values and instructions that are invariant across multiple threads. Our compiler algorithms are effective at identifying convergent execution even in programs with arbitrary control flow, identifying two-thirds of the opportunity captured by a dynamic oracle. The compile-time analysis leads to a reduction in instructions dispatched by 29%, register file reads and writes by 31% memory address counts by 47%, and data access counts by 38%.
Keywords :
convergence; multi-threading; optimising compilers; parallel architectures; power aware computing; GPUs; application kernels; arbitrary control flow; automatic scalar operation detection; compile-time analysis; compiler algorithms; convergent execution; data access counts; data parallelism; data-parallel architecture convergence; data-parallel architecture scalarization; dynamic oracle; energy inefficiency; instruction dispatch; memory address counts; memory operations; parallel code; register file reads; register file writes; scalarizing compiler; threaded code; threaded programming model; throughput processors; vector architectures; Algorithm design and analysis; Computer architecture; Convergence; Graphics processing units; Instruction sets; Kernel; Registers; CUDA; GPU; Scalarization;
Conference_Titel :
Code Generation and Optimization (CGO), 2013 IEEE/ACM International Symposium on
Conference_Location :
Shenzhen
Print_ISBN :
978-1-4673-5524-7
DOI :
10.1109/CGO.2013.6494995