• DocumentCode
    3759155
  • Title

    An Efficient Vectorization Approach to Nested Thread-level Parallelism for CUDA GPUs

  • Author

    Shixiong Xu;David Gregg

  • Author_Institution
    Software Tools Group, Univ. of Dublin, Dublin, Ireland
  • fYear
    2015
  • Firstpage
    488
  • Lastpage
    489
  • Abstract
    Nested thread-level parallelism (TLP) is pervasive in real applications. For example, 75% (14 out of 19) of the applications in the Rodinia benchmark for heterogeneous accelerators contain kernels with nested thread-level parallelism. Efficiently mapping the enclosed nested parallelism to the GPU threads in the C-to-CUDA compilation (OpenACC in this paper) is becoming more and more important. This mapping problem is two folds: suitable execution models and efficient mapping strategies of the nested parallelism.
  • Keywords
    "Graphics processing units","Message systems","Parallel processing","Parallel architectures","Software engineering","Benchmark testing","Kernel"
  • Publisher
    ieee
  • Conference_Titel
    Parallel Architecture and Compilation (PACT), 2015 International Conference on
  • ISSN
    1089-795X
  • Type

    conf

  • DOI
    10.1109/PACT.2015.56
  • Filename
    7429334