• DocumentCode
    626670
  • Title

    DRAM access reduction in GPUs by thread-block scheduling for overlapped data reuse

  • Author

    Seungyeol Lee ; Wonyong Sung

  • Author_Institution
    Dept. of Electr. Eng., Seoul Nat. Univ., Seoul, South Korea
  • fYear
    2013
  • fDate
    19-23 May 2013
  • Firstpage
    901
  • Lastpage
    904
  • Abstract
    General Purpose Graphics Processing Units (GPG-PUs) show very high throughput when executing parallel programs. However, they usually demand very large DRAM bandwidth and consume much power for memory access. Although recent high performance GPGPUs equip L2 cache to absorb some of DRAM accesses, the cache hit ratio can hardly be very high because of the limited cache size. We propose a GPU thread-block scheduling method that can better utilize L2 cache and reduce the DRAM memory access. This scheduling method exploits the inter-block locality in the scheduling of GPU thread-blocks. This method can easily be implemented by modifying application programs. This technique is applied to the Hotspot benchmark programs, and reduces the DRAM access by up to 39%.
  • Keywords
    DRAM chips; cache storage; graphics processing units; scheduling; DRAM access reduction; DRAM bandwidth; DRAM memory access; GPU; Hotspot benchmark programs; L2 cache; application programs; cache hit ratio; cache size; general purpose graphics processing units; inter-block locality; overlapped data reuse; parallel programs; thread-block scheduling; Cache memory; Computer architecture; Graphics processing units; Instruction sets; Message systems; Random access memory; Strips;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Circuits and Systems (ISCAS), 2013 IEEE International Symposium on
  • Conference_Location
    Beijing
  • ISSN
    0271-4302
  • Print_ISBN
    978-1-4673-5760-9
  • Type

    conf

  • DOI
    10.1109/ISCAS.2013.6571993
  • Filename
    6571993