DocumentCode
2949651
Title
Adaptive Spill-Receive for robust high-performance caching in CMPs
Author
Qureshi, Moinuddin K.
Author_Institution
Res. Div., T.J. Watson Res. Center, IBM, Yorktown Heights, NY
fYear
2009
fDate
14-18 Feb. 2009
Firstpage
45
Lastpage
54
Abstract
In a chip multi-processor (CMP) with private caches, the last level cache is statically partitioned between all the cores. This prevents such CMPs from sharing cache capacity in response to the requirement of individual cores. Capacity sharing can be provided in private caches by spilling a line evicted from one cache to another cache. However, naively allowing all caches to spill evicted lines to other caches have limited performance benefit as such spilling does not take into account which cores benefit from extra capacity and which cores can provide extra capacity. This paper proposes dynamic spill-receive (DSR) for efficient capacity sharing. In a DSR architecture, each cache uses set dueling to learn whether it should act as a ldquospiller cacherdquo or ldquoreceiver cacherdquo for best overall performance. We evaluate DSR for a quad-core system with 1MB private caches using 495 multi-programmed workloads. DSR improves average throughput by 18% (weighted-speedup by 13% and harmonic-mean fairness metric by 36%) compared to no spilling. DSR requires a total storage overhead of less than two bytes per core, does not require any changes to the existing cache structure, and is scalable to a large number of cores (16 in our evaluation). Furthermore, we propose a simple extension of DSR that provides quality of service (QoS) by guaranteeing that the worst-case performance of each application remains similar to that with no spilling, while still providing an average throughput improvement of 17.5%.
Keywords
cache storage; microprocessor chips; quality of service; CMP; DSR architecture; adaptive spill-receive; cache capacity; cache structure; capacity sharing; chip multiprocessor; dynamic spill-receive; harmonic-mean fairness metric; multiprogrammed workloads; private caches; quad-core system; quality of service; receiver cache; robust high-performance caching; spiller cache; storage overhead; Bandwidth; Cache storage; Cooperative caching; Delay; Design optimization; Fabrics; Quality of service; Robustness; Throughput; Wire;
fLanguage
English
Publisher
ieee
Conference_Titel
High Performance Computer Architecture, 2009. HPCA 2009. IEEE 15th International Symposium on
Conference_Location
Raleigh, NC
ISSN
1530-0897
Print_ISBN
978-1-4244-2932-5
Type
conf
DOI
10.1109/HPCA.2009.4798236
Filename
4798236
Link To Document