DocumentCode :
3759155
Title :
An Efficient Vectorization Approach to Nested Thread-level Parallelism for CUDA GPUs
Author :
Shixiong Xu;David Gregg
Author_Institution :
Software Tools Group, Univ. of Dublin, Dublin, Ireland
fYear :
2015
Firstpage :
488
Lastpage :
489
Abstract :
Nested thread-level parallelism (TLP) is pervasive in real applications. For example, 75% (14 out of 19) of the applications in the Rodinia benchmark for heterogeneous accelerators contain kernels with nested thread-level parallelism. Efficiently mapping the enclosed nested parallelism to the GPU threads in the C-to-CUDA compilation (OpenACC in this paper) is becoming more and more important. This mapping problem is two folds: suitable execution models and efficient mapping strategies of the nested parallelism.
Keywords :
"Graphics processing units","Message systems","Parallel processing","Parallel architectures","Software engineering","Benchmark testing","Kernel"
Publisher :
ieee
Conference_Titel :
Parallel Architecture and Compilation (PACT), 2015 International Conference on
ISSN :
1089-795X
Type :
conf
DOI :
10.1109/PACT.2015.56
Filename :
7429334
Link To Document :
بازگشت