DocumentCode :
3496774
Title :
An energy efficient GPGPU memory hierarchy with tiny incoherent caches
Author :
Sankaranarayanan, Alamelu ; Ardestani, Ehsan K. ; Briz, Jose Luis ; Renau, Jose
Author_Institution :
Dept. of Comput. Eng., Univ. of California Santa Cruz, Santa Cruz, CA, USA
fYear :
2013
fDate :
4-6 Sept. 2013
Firstpage :
9
Lastpage :
14
Abstract :
With progressive generations and the ever-increasing promise of computing power, GPGPUs have been quickly growing in size, and at the same time, energy consumption has become a major bottleneck for them. The first level data cache and the scratchpad memory are critical to the performance of a GPGPU, but they are extremely energy inefficient due to the large number of cores they need to serve. This problem could be mitigated by introducing a cache higher up in hierarchy that services fewer cores, but this introduces cache coherency issues that may become very significant, especially for a GPGPU with hundreds of thousands of in-flight threads. In this paper, we propose adding incoherent tinyCaches between each lane in an SM, and the first level data cache that is currently shared by all the lanes in an SM. In a normal multiprocessor, this would require hardware cache coherence between all the SM lanes capable of handling hundreds of thousands of threads. Our incoherent tinyCache architecture exploits certain unique features of the CUDA/OpenCL programming model to avoid complex coherence schemes. This tinyCache is able to filter out 62% of memory requests that would otherwise need to be serviced by the DL1G, and almost 81% of scratchpad memory requests, allowing us to achieve a 37% energy reduction in the on-chip memory hierarchy. We evaluate the tinyCache for different memory patterns and show that it is beneficial in most cases.
Keywords :
cache storage; graphics processing units; multiprocessing systems; CUDA/OpenCL programming model; energy efficient GPGPU memory hierarchy; first level data cache; hardware cache coherence; incoherent tinyCache architecture; memory patterns; multiprocessors; on-chip memory hierarchy; scratchpad memory; Benchmark testing; Coherence; Computer architecture; Graphics processing units; Instruction sets; Registers; System-on-chip; Caches; Energy-efficiency; GPGPUs; Memory hierarchy;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Low Power Electronics and Design (ISLPED), 2013 IEEE International Symposium on
Conference_Location :
Beijing
Print_ISBN :
978-1-4799-1234-6
Type :
conf
DOI :
10.1109/ISLPED.2013.6629259
Filename :
6629259
Link To Document :
بازگشت