DocumentCode
626670
Title
DRAM access reduction in GPUs by thread-block scheduling for overlapped data reuse
Author
Seungyeol Lee ; Wonyong Sung
Author_Institution
Dept. of Electr. Eng., Seoul Nat. Univ., Seoul, South Korea
fYear
2013
fDate
19-23 May 2013
Firstpage
901
Lastpage
904
Abstract
General Purpose Graphics Processing Units (GPG-PUs) show very high throughput when executing parallel programs. However, they usually demand very large DRAM bandwidth and consume much power for memory access. Although recent high performance GPGPUs equip L2 cache to absorb some of DRAM accesses, the cache hit ratio can hardly be very high because of the limited cache size. We propose a GPU thread-block scheduling method that can better utilize L2 cache and reduce the DRAM memory access. This scheduling method exploits the inter-block locality in the scheduling of GPU thread-blocks. This method can easily be implemented by modifying application programs. This technique is applied to the Hotspot benchmark programs, and reduces the DRAM access by up to 39%.
Keywords
DRAM chips; cache storage; graphics processing units; scheduling; DRAM access reduction; DRAM bandwidth; DRAM memory access; GPU; Hotspot benchmark programs; L2 cache; application programs; cache hit ratio; cache size; general purpose graphics processing units; inter-block locality; overlapped data reuse; parallel programs; thread-block scheduling; Cache memory; Computer architecture; Graphics processing units; Instruction sets; Message systems; Random access memory; Strips;
fLanguage
English
Publisher
ieee
Conference_Titel
Circuits and Systems (ISCAS), 2013 IEEE International Symposium on
Conference_Location
Beijing
ISSN
0271-4302
Print_ISBN
978-1-4673-5760-9
Type
conf
DOI
10.1109/ISCAS.2013.6571993
Filename
6571993
Link To Document