مرکز منطقه ای اطلاع رساني علوم و فناوري - Exploiting set-level non-uniformity of capacity demand to enhance CMP cooperative caching

DocumentCode :

2441306

Title :

Exploiting set-level non-uniformity of capacity demand to enhance CMP cooperative caching

Author :

Zhan, Dongyuan ; Jiang, Hong ; Seth, Sharad C.

Author_Institution :

Dept. of Comput. Sci. & Eng., Univ. of Nebraska - Lincoln, Lincoln, NE, USA

fYear :

2010

fDate :

19-23 April 2010

Firstpage :

Lastpage :

Abstract :

As the Memory Wall remains a bottleneck for Chip Multiprocessors (CMP), the effective management of CMP last level caches becomes of paramount importance in minimizing expensive off-chip memory accesses. For the CMPs with private last level caches, Cooperative Caching (CC) has been proposed to enable capacity sharing among private caches by spilling an evicted block from one cache to another. But this eviction-driven CC does not necessarily promote the cache performance since it implicitly favors the applications full of block evictions regardless of their real capacity demand. The recent Dynamic Spill-Receive (DSR) paradigm improves CC by prioritizing applications with higher benefit from extra capacity in spilling blocks. However, the DSR paradigm only exploits the coarse-grained application-level difference in capacity demand, making it less effective as the non-uniformity exists at a much finer level. This paper (i) highlights the observation of cache set-level non-uniformity of capacity demand, and (ii) presents a novel L2 cache design, named SNUG (Set-level Non-Uniformity identifier and Grouper), to exploit the fine-grained non-uniformity to further enhance the effectiveness of cooperative caching. By utilizing a per-set shadow tag array and saturating counter, SNUG can identify whether a set should either spill or receive blocks; by using an index-bit flipping scheme, SNUG can group peer sets for spilling and receiving in an flexible way, capturing more opportunities for cooperative caching. We evaluate our design through extensive execution-driven simulations on Quad-core CMP systems. Our results show that for 6 classes of workload combinations our SNUG cache can improve the CMP throughput by up to 22.3%, with an average of 13.9% over the baseline configuration, while the state-of-the-art DSR scheme can only achieve an improvement by up to 14.5% and 8.4% on average.

Keywords :

cache storage; multiprocessing systems; CMP cooperative caching; SNUG; cache performance; capacity demand; capacity sharing; chip multiprocessors; dynamic spill-receive paradigm; fine-grained nonuniformity; index-bit flipping scheme; off-chip memory access; per-set shadow tag array; quad-core CMP systems; set-level nonuniformity identifier; Bandwidth; Computer science; Cooperative caching; Data engineering; Delay; Engineering management; Interleaved codes; Memory management; Random access memory; Resource management; Chip Multiprocessors; Cooperative Caching; Last Level Cache Management; Set-Level Non-Uniformity of Capacity Demand;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Parallel & Distributed Processing (IPDPS), 2010 IEEE International Symposium on

Conference_Location :

Atlanta, GA

ISSN :

1530-2075

Print_ISBN :

978-1-4244-6442-5

Type :

conf

DOI :

10.1109/IPDPS.2010.5470441

Filename :

5470441

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=2441306