DocumentCode :
3682578
Title :
Fast Computational GPU Design with GT-Pin
Author :
Melanie Kambadur;Sunpyo Hong;Juan Cabral;Harish Patil;Chi-Keung Luk;Sohaib Sajid;Martha A. Kim
Author_Institution :
Columbia Univ., New York, NY, USA
fYear :
2015
Firstpage :
76
Lastpage :
86
Abstract :
As computational applications become common for graphics processing units, new hardware designs must be developed to meet the unique needs of these workloads. Performance simulation is an important step in appraising how well a candidate design will serve these needs, but unfortunately, computational GPU programs are so large that simulating them in detail is prohibitively slow. This work addresses the need to understand very large computational GPU programs in three ways. First, it introduces a fast tracing tool that uses binary instrumentation for in-depth analyses of native executions on existing architectures. Second, it characterizes 25 commercial and benchmark OpenCL applications, which average 308 billion GPU instructions apiece and are by far the largest benchmarks that have been natively profiled at this level of detail. Third, it accelerates simulation of future hardware by pinpointing small subsets of OpenCL applications that can be simulated as representative surrogates in lieu of full-length programs. Our fast selection method requires no simulation itself and allows the user to navigate the accuracy/simulation speed trade-off space, from extremely accurate with reasonable speedups (35X increase in simulation speed for 0.3% error) to reasonably accurate with extreme speedups (223X simulation speedup for 3.0% error).
Keywords :
"Graphics processing units","Kernel","Benchmark testing","Hardware","Synchronization","Runtime","Computational modeling"
Publisher :
ieee
Conference_Titel :
Workload Characterization (IISWC), 2015 IEEE International Symposium on
Type :
conf
DOI :
10.1109/IISWC.2015.14
Filename :
7314149
Link To Document :
بازگشت