Title :
The Sharing Tracker: Using Ideas from Cache Coherence Hardware to Reduce Off-Chip Memory Traffic with Non-Coherent Caches
Author :
Tarjan, David ; Skadron, Kevin
Author_Institution :
Dept. of Comput. Sci., Univ. of Virginia, Charlottesville, VA, USA
Abstract :
Graphics Processing Units (GPUs) have recently emerged as a new platform for high performance, general-purpose computing. Because current GPUs employ deep multithreading to hide latency, they only have small, per-core caches to capture reuse and eliminate unnecessary off-chip accesses. This paper shows that for general-purpose workloads, the ability to copy cache lines between private caches captures inter-core temporal locality and provides substantial reductions in off-chip bandwidth requirements. Unlike hardware cache coherence, a sharing tracker only needs to track cache lines in the private caches imprecisely, because it is only a performance hint. This simplifies the implementation and is so effective at capturing inter-core reuse that the L2 can be eliminated entirely. The sharing tracker is motivated by but not specific to the GPU and could be used in other manycore organizations.
Keywords :
cache storage; computer graphic equipment; coprocessors; multi-threading; L2; cache coherence hardware; cache lines; general-purpose computing; graphics processing units; intercore temporal locality; multithreading; noncoherent caches; off-chip bandwidth; off-chip memory traffic; private caches; sharing tracker; Bandwidth; Coherence; Computer architecture; Graphics processing unit; Hardware; Instruction sets; Organizations;
Conference_Titel :
High Performance Computing, Networking, Storage and Analysis (SC), 2010 International Conference for
Conference_Location :
New Orleans, LA
Print_ISBN :
978-1-4244-7557-5
Electronic_ISBN :
978-1-4244-7558-2