مرکز منطقه ای اطلاع رساني علوم و فناوري - Scalar Waving: Improving the Efficiency of SIMD Execution on GPUs

DocumentCode :

1783202

Title :

Scalar Waving: Improving the Efficiency of SIMD Execution on GPUs

Author :

Yilmazer, Ayse ; Zhongliang Chen ; Kaeli, David

Author_Institution :

Electr. & Comput. Eng. Dept., Northeastern Univ., Boston, MA, USA

fYear :

2014

fDate :

19-23 May 2014

Firstpage :

103

Lastpage :

112

Abstract :

GPUs take advantage of uniformity in program control flow and utilize SIMD execution to obtain execution efficiency. In SIMD execution, threads are batched into SIMD groups to share a common program counter and execute identical instructions on SIMD pipelines. Previous research has shown that there is a significant number of scalar instructions - instructions where different threads in a SIMD group execute using the same input operands and generate the exact same output - present in a range of applications. GPUs eliminate redundant fetches and decodes by utilizing a shared common pipeline front-end. However, most GPUs do not handle scalar instruction efficiently, allowing these instructions to be redundantly executed by the threads in a SIMD group. In this paper, we propose to use scalar execution to eliminate redundant execution of scalar instructions. We introduce scalar waving as a mechanism to batch scalar operations possessing the same PC and execute them as a group on SIMD lanes for efficiency. We also propose simultaneous execution of dynamically-formed scalar waves with SIMD groups to overcome the under-utilization of SIMD lanes when encountering divergence. We evaluate our work using 22 different GPU benchmarks taken from 4 different benchmark suites. We evaluate a range of configurations using timing simulation. Our results show that scalar waving can obtain up to a 25% improvement in performance on average. Our experiments also provide insight into the amount of performance gain that we can expect with scalar waving as a function of the scalar content, occupancy, and memory characteristics of the target application.

Keywords :

graphics processing units; instruction sets; multiprocessing systems; pipeline processing; GPU benchmarks; SIMD execution efficiency; batch scalar operations; dynamically-formed scalar waves; program control flow; program counter; scalar instructions; scalar waving; Benchmark testing; Computer architecture; Graphics processing units; Instruction sets; Pipelines; Registers; GPU; Redundant Computation; SIMD Efficiency; Scalar Waving;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Parallel and Distributed Processing Symposium, 2014 IEEE 28th International

Conference_Location :

Phoenix, AZ

ISSN :

1530-2075

Print_ISBN :

978-1-4799-3799-8

Type :

conf

DOI :

10.1109/IPDPS.2014.22

Filename :

6877246

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=1783202