• DocumentCode
    656140
  • Title

    Adaptive Runtime Selection for GPU

  • Author

    Dollinger, Jean-Francois ; Loechner, Vincent

  • Author_Institution
    ICube Lab., Strasbourg Univ., Strasbourg, France
  • fYear
    2013
  • fDate
    1-4 Oct. 2013
  • Firstpage
    70
  • Lastpage
    79
  • Abstract
    It is often hard to predict the performance of a statically generated code. Hardware availability, hardware specification and problem size may change from one execution context to another. The main contribution of this work is an entirely automatic method aiming to predict execution times of semantically equivalent versions of affine loop nests on GPUs, then, to run the best performing one on GPU or CPU. To make accurate predictions, our framework relies on three consecutive stages: a static code generation, an offline profiling and an online prediction. Different versions are statically generated by PPCG, a source-to-source polyhedral compiler, able to generate CUDA code from static control loops written in C. The code versions differ by their block sizes, tiling and parallel schedule. The profiling code carries out the required measurements on the target machine: throughput between host and device memory, and execution time of the kernels with various parameters. At runtime, we rely on those results to calculate a predicted execution time on GPU. This is followed by a "fastest wins" algorithm, that runs instances of the target code concurrently on CPU and GPU, the first completed kills the other one. We validate this proposal on the polyhedral benchmark suite, showing that the predictions are accurate and that the runtime selection is effective on two different architectures.
  • Keywords
    graphics processing units; performance evaluation; CUDA code; GPU; PPCG; adaptive runtime selection; affine loop nests; automatic method; device memory; fastest wins algorithm; hardware availability; hardware specification; kernels; offline profiling; online prediction; parallel schedule; polyhedral benchmark suite; source-to-source polyhedral compiler; static code generation; static control loops; target code; Bandwidth; Context; Graphics processing units; Hardware; Instruction sets; Kernel; Runtime; CPU-GPU fastest wins; multiversioning; offline profiling; performance prediction; polyhedral model;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Parallel Processing (ICPP), 2013 42nd International Conference on
  • Conference_Location
    Lyon
  • ISSN
    0190-3918
  • Type

    conf

  • DOI
    10.1109/ICPP.2013.16
  • Filename
    6687340