DocumentCode
3537807
Title
An Instruction-Level Energy Estimation and Optimization Methodology for GPU
Author
Wang, Yue ; Ranganathan, Nagarajan
Author_Institution
Dept. of Comput. Sci. & Eng., Univ. of South Florida, Tampa, FL, USA
fYear
2011
fDate
Aug. 31 2011-Sept. 2 2011
Firstpage
621
Lastpage
628
Abstract
Nowadays, GPU architecture is commonly exploited in various researches on computer graphic and other scientific computing areas. Parallel computing feature of GPU provides performance benefits for execution of many programs. However, as the parallel degree keeps extending, the number of active cores in GPU required for execution is also increasing. Therefore the rising of energy consumption caused by using large number of cores begins to draw attention. Previous research [1] reveals that given a multicore program, the curve of energy consumption first falls and then rises, as the number of active cores increases. That means we can have the minimum energy consumption if the number of active cores is properly configured. In this paper, we develop an instruction-level prediction mechanism to estimate the energy consumption of a given program under different numbers of cores. The prediction is based on the profile of Parallel Thread Execution (PTX) [2] codes generated during compilation of the original program. With the help of this mechanism, the energy-optimal number of cores can be found during compilation and used in execution, replacing the one given by programmer. Tests have been carried on several NVIDIA CUDA [10] benchmarks. The results show that the energy consumption is minimized without losing much performance. With the predicted energy-optimal number of active cores, we show that the energy consumption saving for the selected benchmarks is from 7.31% to 11.76% on average, with a worst case of performance lost 4.92%.
Keywords
computer graphic equipment; coprocessors; parallel architectures; GPU architecture; NVIDIA CUDA benchmarks; instruction-level energy estimation; instruction-level prediction mechanism; minimum energy consumption; optimization methodology; parallel computing; parallel thread execution codes; Benchmark testing; Energy consumption; Energy measurement; Graphics processing unit; Instruction sets; Multicore processing; Registers; GPU; energy; energy estimation; energy optimization; instruction-level; multicore;
fLanguage
English
Publisher
ieee
Conference_Titel
Computer and Information Technology (CIT), 2011 IEEE 11th International Conference on
Conference_Location
Pafos
Print_ISBN
978-1-4577-0383-6
Electronic_ISBN
978-0-7695-4388-8
Type
conf
DOI
10.1109/CIT.2011.69
Filename
6036835
Link To Document