• DocumentCode
    1954830
  • Title

    Auto-tuning a high-level language targeted to GPU codes

  • Author

    Grauer-Gray, Scott ; Xu, Lifan ; Searles, Robert ; Ayalasomayajula, Sudhee ; Cavazos, John

  • Author_Institution
    Comput. & Inf. Sci., Univ. of Delaware, Newark, DE, USA
  • fYear
    2012
  • fDate
    13-14 May 2012
  • Firstpage
    1
  • Lastpage
    10
  • Abstract
    Determining the best set of optimizations to apply to a kernel to be executed on the graphics processing unit (GPU) is a challenging problem. There are large sets of possible optimization configurations that can be applied, and many applications have multiple kernels. Each kernel may require a specific configuration to achieve the best performance, and moving an application to new hardware often requires a new optimization configuration for each kernel. In this work, we apply optimizations to GPU code using HMPP, a high-level directive-based language and source-to-source compiler that can generate CUDA / OpenCL code. However, programming with high-level languages may mean a loss of performance compared to using low-level languages. Our work shows that it is possible to improve the performance of a high-level language by using auto-tuning. We perform auto-tuning on a large optimization space on GPU kernels, focusing on loop permutation, loop unrolling, tiling, and specifying which loop(s) to parallelize, and show results on convolution kernels, codes in the PolyBench suite, and an implementation of belief propagation for stereo vision. The results show that our auto-tuned HMPP-generated implementations are significantly faster than the default HMPP implementation and can meet or exceed the performance of manually coded CUDA / OpenCL implementations.
  • Keywords
    graphics processing units; high level languages; multiprocessing systems; parallel architectures; parallel programming; program compilers; stereo image processing; CUDA; GPU codes; HMPP; OpenCL code; PolyBench suite; autotuning; belief propagation; convolution kernels; graphics processing unit; high-level directive-based language; hybrid multicore parallel programming; loop permutation; loop tiling; loop unrolling; optimization configuration; source-to-source compiler; stereo vision; Abstracts; Benchmark testing; Graphics processing unit; Nickel; Programming; Tiles; Auto-tuning; Belief Propagation; CUDA; GPU; OpenCL; Optimization;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Innovative Parallel Computing (InPar), 2012
  • Conference_Location
    San Jose, CA
  • Print_ISBN
    978-1-4673-2632-2
  • Electronic_ISBN
    978-1-4673-2631-5
  • Type

    conf

  • DOI
    10.1109/InPar.2012.6339595
  • Filename
    6339595