• DocumentCode
    2515776
  • Title

    An integer programming framework for optimizing shared memory use on GPUs

  • Author

    Ma, Wenjing ; Agrawal, Gagan

  • Author_Institution
    Dept. of Comput. Sci. & Eng., Ohio State Univ., Columbus, OH, USA
  • fYear
    2010
  • fDate
    19-22 Dec. 2010
  • Firstpage
    1
  • Lastpage
    10
  • Abstract
    General purpose computing using GPUs is becoming increasingly popular, because of GPU´s extremely favorable performance/price ratio. Besides application development using CUDA, automatic code generation for GPUs is also receiving attention. Like standard processors, GPUs also have a memory hierarchy, which must be carefully optimized for in order to achieve efficient execution. Specifically, modern NVIDIA GPUs have a very small programmable cache, referred to as shared memory, accesses to which are nearly 100 to 150 times faster than accesses to the regular device memory. An automatically generated or hand-written CUDA program can explicitly control what variables and array sections are allocated on the shared memory at any point during the execution. This, however, leads to a difficult optimization problem. In this paper, we formulate and solve the shared memory allocation problem as an integer programming problem. We present a global (intraprocedural) framework which can model structured control flow, and is not restricted to a single loop nest. We consider allocation of scalars, arrays, and array sections on shared memory. We also briefly show how our framework can suggest useful loop transformations to further improve performance. Our experiments using several non-scientific application show that our integer programming framework outperforms a recently published heuristic method, and our loop transformations also improve performance for many applications.
  • Keywords
    cache storage; coprocessors; integer programming; shared memory systems; CUDA program; NVIDIA GPU; automatic code generation; general purpose computing; integer programming; memory hierarchy; optimizing shared memory use; programmable cache; shared memory allocation problem; structured control flow; Arrays; Graphics processing unit; Instruction sets; Linear programming; Registers; Resource management;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    High Performance Computing (HiPC), 2010 International Conference on
  • Conference_Location
    Dona Paula
  • Print_ISBN
    978-1-4244-8518-5
  • Electronic_ISBN
    978-1-4244-8519-2
  • Type

    conf

  • DOI
    10.1109/HIPC.2010.5713187
  • Filename
    5713187