• DocumentCode
    692859
  • Title

    A large-scale cross-architecture evaluation of thread-coarsening

  • Author

    Magni, Alessandro ; Dubach, Christophe ; O´Boyle, Michael F. P.

  • Author_Institution
    Univ. of Edinburgh, Edinburgh, UK
  • fYear
    2013
  • fDate
    17-22 Nov. 2013
  • Firstpage
    1
  • Lastpage
    11
  • Abstract
    OpenCL has become the de-facto data parallel programming model for parallel devices in today´s high-performance supercomputers. OpenCL was designed with the goal of guaranteeing program portability across hardware from different vendors. However, achieving good performance is hard, requiring manual tuning of the program and expert knowledge of each target device. In this paper we consider a data parallel compiler transformation - thread-coarsening - and evaluate its effects across a range of devices by developing a source-to-source OpenCL compiler based on LLVM. We thoroughly evaluate this transformation on 17 benchmarks and five platforms with different coarsening parameters giving over 43,000 different experiments. We achieve speedups over 9x on individual applications and average speedups ranging from 1.15x on the Nvidia Kepler GPU to 1.50x on the AMD Cypress GPU. Finally, we use statistical regression to analyse and explain program performance in terms of hardware-based performance counters.
  • Keywords
    graphics processing units; multi-threading; program compilers; regression analysis; software architecture; software performance evaluation; software portability; AMD Cypress GPU; LLVM; Nvidia Kepler GPU; data parallel compiler trans- formation; de-facto data parallel programming model; hardware-based performance counters; high-performance supercomputers; large-scale cross-architecture evaluation; program performance; program portability; source-to-source OpenCL compiler; statistical regression; thread-coarsening parameters; Benchmark testing; Graphics processing units; Hardware; Instruction sets; Kernel; Multicore processing; Performance evaluation; GPU; OpenCL; Regression trees; Thread coarsening;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    High Performance Computing, Networking, Storage and Analysis (SC), 2013 International Conference for
  • Conference_Location
    Denver, CO
  • Print_ISBN
    978-1-4503-2378-9
  • Type

    conf

  • DOI
    10.1145/2503210.2503268
  • Filename
    6877444