مرکز منطقه ای اطلاع رساني علوم و فناوري - Convergence and scalarization for data-parallel architectures

DocumentCode :

1902642

Title :

Convergence and scalarization for data-parallel architectures

Author :

Yunsup Lee ; Krashinsky, Ronny ; Grover, Vinod ; Keckler, Stephen W. ; Asanovic, Krste

Author_Institution :

Univ. of California at Berkeley, Berkeley, CA, USA

fYear :

2013

fDate :

23-27 Feb. 2013

Firstpage :

Lastpage :

Abstract :

Modern throughput processors such as GPUs achieve high performance and efficiency by exploiting data parallelism in application kernels expressed as threaded code. One draw-back of this approach compared to conventional vector architectures is redundant execution of instructions that are common across multiple threads, resulting in energy inefficiency due to excess instruction dispatch, register file accesses, and memory operations. This paper proposes to alleviate these overheads while retaining the threaded programming model by automatically detecting the scalar operations and factoring them out of the parallel code. We have developed a scalarizing compiler that employs convergence and variance analyses to statically identify values and instructions that are invariant across multiple threads. Our compiler algorithms are effective at identifying convergent execution even in programs with arbitrary control flow, identifying two-thirds of the opportunity captured by a dynamic oracle. The compile-time analysis leads to a reduction in instructions dispatched by 29%, register file reads and writes by 31% memory address counts by 47%, and data access counts by 38%.

Keywords :

convergence; multi-threading; optimising compilers; parallel architectures; power aware computing; GPUs; application kernels; arbitrary control flow; automatic scalar operation detection; compile-time analysis; compiler algorithms; convergent execution; data access counts; data parallelism; data-parallel architecture convergence; data-parallel architecture scalarization; dynamic oracle; energy inefficiency; instruction dispatch; memory address counts; memory operations; parallel code; register file reads; register file writes; scalarizing compiler; threaded code; threaded programming model; throughput processors; vector architectures; Algorithm design and analysis; Computer architecture; Convergence; Graphics processing units; Instruction sets; Kernel; Registers; CUDA; GPU; Scalarization;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Code Generation and Optimization (CGO), 2013 IEEE/ACM International Symposium on

Conference_Location :

Shenzhen

Print_ISBN :

978-1-4673-5524-7

Type :

conf

DOI :

10.1109/CGO.2013.6494995

Filename :

6494995

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=1902642