Title :
Exploiting set-level non-uniformity of capacity demand to enhance CMP cooperative caching
Author :
Zhan, Dongyuan ; Jiang, Hong ; Seth, Sharad C.
Author_Institution :
Dept. of Comput. Sci. & Eng., Univ. of Nebraska - Lincoln, Lincoln, NE, USA
Abstract :
As the Memory Wall remains a bottleneck for Chip Multiprocessors (CMP), the effective management of CMP last level caches becomes of paramount importance in minimizing expensive off-chip memory accesses. For the CMPs with private last level caches, Cooperative Caching (CC) has been proposed to enable capacity sharing among private caches by spilling an evicted block from one cache to another. But this eviction-driven CC does not necessarily promote the cache performance since it implicitly favors the applications full of block evictions regardless of their real capacity demand. The recent Dynamic Spill-Receive (DSR) paradigm improves CC by prioritizing applications with higher benefit from extra capacity in spilling blocks. However, the DSR paradigm only exploits the coarse-grained application-level difference in capacity demand, making it less effective as the non-uniformity exists at a much finer level. This paper (i) highlights the observation of cache set-level non-uniformity of capacity demand, and (ii) presents a novel L2 cache design, named SNUG (Set-level Non-Uniformity identifier and Grouper), to exploit the fine-grained non-uniformity to further enhance the effectiveness of cooperative caching. By utilizing a per-set shadow tag array and saturating counter, SNUG can identify whether a set should either spill or receive blocks; by using an index-bit flipping scheme, SNUG can group peer sets for spilling and receiving in an flexible way, capturing more opportunities for cooperative caching. We evaluate our design through extensive execution-driven simulations on Quad-core CMP systems. Our results show that for 6 classes of workload combinations our SNUG cache can improve the CMP throughput by up to 22.3%, with an average of 13.9% over the baseline configuration, while the state-of-the-art DSR scheme can only achieve an improvement by up to 14.5% and 8.4% on average.
Keywords :
cache storage; multiprocessing systems; CMP cooperative caching; SNUG; cache performance; capacity demand; capacity sharing; chip multiprocessors; dynamic spill-receive paradigm; fine-grained nonuniformity; index-bit flipping scheme; off-chip memory access; per-set shadow tag array; quad-core CMP systems; set-level nonuniformity identifier; Bandwidth; Computer science; Cooperative caching; Data engineering; Delay; Engineering management; Interleaved codes; Memory management; Random access memory; Resource management; Chip Multiprocessors; Cooperative Caching; Last Level Cache Management; Set-Level Non-Uniformity of Capacity Demand;
Conference_Titel :
Parallel & Distributed Processing (IPDPS), 2010 IEEE International Symposium on
Conference_Location :
Atlanta, GA
Print_ISBN :
978-1-4244-6442-5
DOI :
10.1109/IPDPS.2010.5470441