• DocumentCode
    2927180
  • Title

    A statistical performance prediction model for OpenCL kernels on NVIDIA GPUs

  • Author

    Karami, Armine ; Mirsoleimani, Sayyed Ali ; Khunjush, Farshad

  • Author_Institution
    Sch. of Electr. & Comput. Eng., Shiraz Univ., Shiraz, Iran
  • fYear
    2013
  • fDate
    30-31 Oct. 2013
  • Firstpage
    15
  • Lastpage
    22
  • Abstract
    Understanding performance bottlenecks of applications in high performance computing can lead to dramatic improvements of applications performances. For example, a key problem in GPU programming is finding performance bottlenecks and solving them to reach the best possible performance. These bottlenecks in GPU architectures span a variety of factors such as memory access latency, branch divergence, utilization, and the amount of existing parallelism. In addition, a simple profiling cannot demonstrate the relations between these bottlenecks. In this paper, we propose a statistical performance model that not only helps us find bottlenecks but also shows the relations between them which is not possible by using a profiler. The OpenCL programming standard can be used in a variety of platforms (e.g., CPUs and GPUs); therefore, a program written in one platform can be imported to other platforms with minimal effort. As a result, we selected the OpenCL programming standard in order to design our performance model for NVIDIA GPUs. For this, we first measure the values of a GPU performance counters for the selected benchmarks. Then, using the achieved results and applying a regression model and the principle component analysis we develop a model to show how different GPU parameters account for applications performance bottlenecks. Our results show that the proposed model can predict applications behaviors with a 91% accuracy. Moreover, the proposed model is able to characterize unknown applications based on their performance similarities with an existing database of benchmark to predict their likely performance bottlenecks.
  • Keywords
    graphics processing units; parallel architectures; principal component analysis; regression analysis; GPU architectures; GPU parameters; GPU performance counters; GPU programming; NVIDIA GPU; OpenCL kernels; OpenCL programming standard; branch divergence; high performance computing; memory access latency; parallelism; principle component analysis; profiler; regression model; statistical performance model; statistical performance prediction model; Analytical models; Benchmark testing; Graphics processing units; Kernel; Predictive models; Principal component analysis; Programming; GPU; OpenCL; Statistical Performance Model;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Computer Architecture and Digital Systems (CADS), 2013 17th CSI International Symposium on
  • Conference_Location
    Tehran
  • Print_ISBN
    978-1-4799-0562-1
  • Type

    conf

  • DOI
    10.1109/CADS.2013.6714232
  • Filename
    6714232