DocumentCode :
1783202
Title :
Scalar Waving: Improving the Efficiency of SIMD Execution on GPUs
Author :
Yilmazer, Ayse ; Zhongliang Chen ; Kaeli, David
Author_Institution :
Electr. & Comput. Eng. Dept., Northeastern Univ., Boston, MA, USA
fYear :
2014
fDate :
19-23 May 2014
Firstpage :
103
Lastpage :
112
Abstract :
GPUs take advantage of uniformity in program control flow and utilize SIMD execution to obtain execution efficiency. In SIMD execution, threads are batched into SIMD groups to share a common program counter and execute identical instructions on SIMD pipelines. Previous research has shown that there is a significant number of scalar instructions - instructions where different threads in a SIMD group execute using the same input operands and generate the exact same output - present in a range of applications. GPUs eliminate redundant fetches and decodes by utilizing a shared common pipeline front-end. However, most GPUs do not handle scalar instruction efficiently, allowing these instructions to be redundantly executed by the threads in a SIMD group. In this paper, we propose to use scalar execution to eliminate redundant execution of scalar instructions. We introduce scalar waving as a mechanism to batch scalar operations possessing the same PC and execute them as a group on SIMD lanes for efficiency. We also propose simultaneous execution of dynamically-formed scalar waves with SIMD groups to overcome the under-utilization of SIMD lanes when encountering divergence. We evaluate our work using 22 different GPU benchmarks taken from 4 different benchmark suites. We evaluate a range of configurations using timing simulation. Our results show that scalar waving can obtain up to a 25% improvement in performance on average. Our experiments also provide insight into the amount of performance gain that we can expect with scalar waving as a function of the scalar content, occupancy, and memory characteristics of the target application.
Keywords :
graphics processing units; instruction sets; multiprocessing systems; pipeline processing; GPU benchmarks; SIMD execution efficiency; batch scalar operations; dynamically-formed scalar waves; program control flow; program counter; scalar instructions; scalar waving; Benchmark testing; Computer architecture; Graphics processing units; Instruction sets; Pipelines; Registers; GPU; Redundant Computation; SIMD Efficiency; Scalar Waving;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Parallel and Distributed Processing Symposium, 2014 IEEE 28th International
Conference_Location :
Phoenix, AZ
ISSN :
1530-2075
Print_ISBN :
978-1-4799-3799-8
Type :
conf
DOI :
10.1109/IPDPS.2014.22
Filename :
6877246
Link To Document :
بازگشت