• DocumentCode
    167360
  • Title

    A Linear Performance-Breakdown Model for GPU Programming Optimization Guidance

  • Author

    Chapa M, Mario A. ; Hiroyuki, Sato

  • Author_Institution
    Dept. of Electr. Eng. & Inf. Sci., Univ. of Tokyo, Tokyo, Japan
  • fYear
    2014
  • fDate
    19-23 May 2014
  • Firstpage
    596
  • Lastpage
    603
  • Abstract
    The use Graphic Processing Units (GPU) as computing accelerators has been. Nevertheless, writing efficient GPU programs is a difficult and time consuming task. In this paper we present the Linear Performance Breakdown Model (LBPM), an analytic model that is used to extract the breakdown of GPU kernel programs execution time into the three major components that affect its running time. The model can be used as a tool to provide guidelines to detect the performance bottlenecks. Our approach is the incorporation of three elements, the Global-to-Shared Memory Time slice, Shared-to-Private Time slice and Processing Units Time slice. These three factors are integrated into a performance model formula by applying the Normalized Least Squares Method (NLSM). The resulting coefficients are used to construct a performance breakdown graph that reveals the effects of each element in the total execution time of the kernel program. We demonstrate the results obtained with our proposed model with two common numeric routines: Single-Precision General Matrix Multiplication (SGMM) and Fast Fourier Transform (FFT), and apply the model to the results obtained from two GPU devices: A8-3870 AMD Accelerated Processing Unit (APU) and a GTX 660 Nvidia GPU.
  • Keywords
    fast Fourier transforms; graph theory; graphics processing units; least squares approximations; matrix multiplication; shared memory systems; software performance evaluation; A8-3870 AMD accelerated processing unit; APU; FFT; GPU devices; GPU kernel program execution; GPU programming optimization guidance; GTX 660 Nvidia GPU; LBPM; NLSM; SGMM; analytic model; computing accelerators; fast Fourier transform; global-to-shared memory time slice; graphic processing units; kernel program; linear performance-breakdown model; normalized least squares method; performance breakdown graph; processing unit time slice; shared-to-private time slice; single-precision general matrix multiplication; time consuming task; Computational modeling; Computer architecture; Graphics processing units; Kernel; Performance evaluation; Programming; Registers; GPGPU; Modeling; OpenCL;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Parallel & Distributed Processing Symposium Workshops (IPDPSW), 2014 IEEE International
  • Conference_Location
    Phoenix, AZ
  • Print_ISBN
    978-1-4799-4117-9
  • Type

    conf

  • DOI
    10.1109/IPDPSW.2014.70
  • Filename
    6969440