DocumentCode :
572419
Title :
Bypass and insertion algorithms for exclusive last-level caches
Author :
Gaur, Jayesh ; Chaudhuri, Mainak ; Subramoney, Sreenivas
Author_Institution :
Intel Archit. Group, Intel Corp., Bangalore, India
fYear :
2011
fDate :
4-8 June 2011
Firstpage :
81
Lastpage :
92
Abstract :
Inclusive last-level caches (LLCs) waste precious silicon estate due to cross-level replication of cache blocks. As the industry moves toward cache hierarchies with larger inner levels, this wasted cache space leads to bigger performance losses compared to exclusive LLCs. However, exclusive LLCs make the design of replacement policies more challenging. While in an inclusive LLC a block can gather a filtered access history, this is not possible in an exclusive design because the block is de-allocated from the LLC on a hit. As a result, the popular least-recently-used replacement policy and its approximations are rendered ineffective and proper choice of insertion ages of cache blocks becomes even more important in exclusive designs. On the other hand, it is not necessary to fill every block into an exclusive LLC. This is known as selective cache bypassing and is not possible to implement in an inclusive LLC because that would violate inclusion. This paper explores insertion and bypass algorithms for exclusive LLCs. Our detailed execution-driven simulation results show that a combination of our best insertion and bypass policies delivers an improvement of up to 61.2% and on average (geometric mean) 3.4% in terms of instructions retired per cycle (IPC) for 97 single-threaded dynamic instruction traces spanning selected SPEC 2006 and server applications, running on a 2 MB 16-way exclusive LLC compared to a baseline exclusive design in the presence of well-tuned multi-stream hardware prefetchers. The corresponding improvements in throughput for 35 4-way multi-programmed workloads running with an 8 MB 16-way shared exclusive LLC are 20.6% (maximum) and 2.5% (geometric mean).
Keywords :
cache storage; integrated circuit design; 4-way multiprogrammed workloads; IPC; LLC; SPEC 2006; bypass algorithms; cache hierarchies; cross-level cache block replication; exclusive last-level caches; execution-driven simulation; insertion algorithms; instructions retired per cycle; least-recently-used replacement policy; replacement policy design; server applications; silicon estate; single-threaded dynamic instruction traces; Algorithm design and analysis; Coherence; Hardware; Prefetching; Proposals; Resource management; Servers; Exclusive last-level cache; bypass policy; insertion policy;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Computer Architecture (ISCA), 2011 38th Annual International Symposium on
Conference_Location :
San Jose, CA
ISSN :
1063-6897
Print_ISBN :
978-1-4503-0472-6
Type :
conf
Filename :
6307782
Link To Document :
بازگشت