مرکز منطقه ای اطلاع رساني علوم و فناوري - An Efficient Vectorization Approach to Nested Thread-level Parallelism for CUDA GPUs

DocumentCode :

3759155

Title :

An Efficient Vectorization Approach to Nested Thread-level Parallelism for CUDA GPUs

Author :

Shixiong Xu;David Gregg

Author_Institution :

Software Tools Group, Univ. of Dublin, Dublin, Ireland

fYear :

2015

Firstpage :

488

Lastpage :

489

Abstract :

Nested thread-level parallelism (TLP) is pervasive in real applications. For example, 75% (14 out of 19) of the applications in the Rodinia benchmark for heterogeneous accelerators contain kernels with nested thread-level parallelism. Efficiently mapping the enclosed nested parallelism to the GPU threads in the C-to-CUDA compilation (OpenACC in this paper) is becoming more and more important. This mapping problem is two folds: suitable execution models and efficient mapping strategies of the nested parallelism.

Keywords :

"Graphics processing units","Message systems","Parallel processing","Parallel architectures","Software engineering","Benchmark testing","Kernel"

Publisher :

ieee

Conference_Titel :

Parallel Architecture and Compilation (PACT), 2015 International Conference on

ISSN :

1089-795X

Type :

conf

DOI :

10.1109/PACT.2015.56

Filename :

7429334

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=3759155