• DocumentCode
    14663
  • Title

    Autogeneration and Autotuning of 3D Stencil Codes on Homogeneous and Heterogeneous GPU Clusters

  • Author

    Yongpeng Zhang ; Mueller, Frank

  • Author_Institution
    Dept. of Comput. Sci., North Carolina State Univ., Raleigh, NC, USA
  • Volume
    24
  • Issue
    3
  • fYear
    2013
  • fDate
    Mar-13
  • Firstpage
    417
  • Lastpage
    427
  • Abstract
    This paper develops and evaluates search and optimization techniques for autotuning 3D stencil (nearest neighbor) computations on GPUs. Observations indicate that parameter tuning is necessary for heterogeneous GPUs to achieve optimal performance with respect to a search space. Our proposed framework takes a most concise specification of stencil behavior from the user as a single formula, autogenerates tunable code from it, systematically searches for the best configuration and generates the code with optimal parameter configurations for different GPUs. This autotuning approach guarantees adaptive performance for different generations of GPUs while greatly enhancing programmer productivity. Experimental results show that the delivered floating point performance is very close to previous handcrafted work and outperforms other autotuned stencil codes by a large margin. Furthermore, heterogeneous GPU clusters are shown to exhibit the highest performance for dissimilar tuning parameters leveraging proportional partitioning relative to single-GPU performance.
  • Keywords
    graphics processing units; 3D stencil code autogeneration; 3D stencil code autotuning; floating point performance; heterogeneous GPU clusters; homogeneous GPU clusters; nearest neighbor computations; optimization techniques; parameter tuning; search space; search techniques; single-GPU performance; stencil behavior specification; Arrays; Graphics processing unit; Instruction sets; Kernel; Optimization; Three dimensional displays; Tuning; Accelerators; GPGPU programming; GPU clusters; stencil codes;
  • fLanguage
    English
  • Journal_Title
    Parallel and Distributed Systems, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    1045-9219
  • Type

    jour

  • DOI
    10.1109/TPDS.2012.160
  • Filename
    6205746