DRAM access reduction in GPUs by thread-block scheduling for overlapped data reuse

Author

Seungyeol Lee ; Wonyong Sung

Author_Institution

Dept. of Electr. Eng., Seoul Nat. Univ., Seoul, South Korea

fYear

2013

fDate

19-23 May 2013

Firstpage

901

Lastpage

904

Abstract

General Purpose Graphics Processing Units (GPG-PUs) show very high throughput when executing parallel programs. However, they usually demand very large DRAM bandwidth and consume much power for memory access. Although recent high performance GPGPUs equip L2 cache to absorb some of DRAM accesses, the cache hit ratio can hardly be very high because of the limited cache size. We propose a GPU thread-block scheduling method that can better utilize L2 cache and reduce the DRAM memory access. This scheduling method exploits the inter-block locality in the scheduling of GPU thread-blocks. This method can easily be implemented by modifying application programs. This technique is applied to the Hotspot benchmark programs, and reduces the DRAM access by up to 39%.

Keywords

DRAM chips; cache storage; graphics processing units; scheduling; DRAM access reduction; DRAM bandwidth; DRAM memory access; GPU; Hotspot benchmark programs; L2 cache; application programs; cache hit ratio; cache size; general purpose graphics processing units; inter-block locality; overlapped data reuse; parallel programs; thread-block scheduling; Cache memory; Computer architecture; Graphics processing units; Instruction sets; Message systems; Random access memory; Strips;

fLanguage

English

Publisher

ieee

Conference_Titel

Circuits and Systems (ISCAS), 2013 IEEE International Symposium on

Conference_Location

Beijing

ISSN

0271-4302

Print_ISBN

978-1-4673-5760-9

Type

conf

DOI

10.1109/ISCAS.2013.6571993

Filename

6571993