Adaptive Spill-Receive for robust high-performance caching in CMPs

Author

Qureshi, Moinuddin K.

Author_Institution

Res. Div., T.J. Watson Res. Center, IBM, Yorktown Heights, NY

fYear

2009

fDate

14-18 Feb. 2009

Firstpage

45

Lastpage

54

Abstract

In a chip multi-processor (CMP) with private caches, the last level cache is statically partitioned between all the cores. This prevents such CMPs from sharing cache capacity in response to the requirement of individual cores. Capacity sharing can be provided in private caches by spilling a line evicted from one cache to another cache. However, naively allowing all caches to spill evicted lines to other caches have limited performance benefit as such spilling does not take into account which cores benefit from extra capacity and which cores can provide extra capacity. This paper proposes dynamic spill-receive (DSR) for efficient capacity sharing. In a DSR architecture, each cache uses set dueling to learn whether it should act as a ldquospiller cacherdquo or ldquoreceiver cacherdquo for best overall performance. We evaluate DSR for a quad-core system with 1MB private caches using 495 multi-programmed workloads. DSR improves average throughput by 18% (weighted-speedup by 13% and harmonic-mean fairness metric by 36%) compared to no spilling. DSR requires a total storage overhead of less than two bytes per core, does not require any changes to the existing cache structure, and is scalable to a large number of cores (16 in our evaluation). Furthermore, we propose a simple extension of DSR that provides quality of service (QoS) by guaranteeing that the worst-case performance of each application remains similar to that with no spilling, while still providing an average throughput improvement of 17.5%.

Keywords

cache storage; microprocessor chips; quality of service; CMP; DSR architecture; adaptive spill-receive; cache capacity; cache structure; capacity sharing; chip multiprocessor; dynamic spill-receive; harmonic-mean fairness metric; multiprogrammed workloads; private caches; quad-core system; quality of service; receiver cache; robust high-performance caching; spiller cache; storage overhead; Bandwidth; Cache storage; Cooperative caching; Delay; Design optimization; Fabrics; Quality of service; Robustness; Throughput; Wire;

fLanguage

English

Publisher

ieee

Conference_Titel

High Performance Computer Architecture, 2009. HPCA 2009. IEEE 15th International Symposium on

Conference_Location

Raleigh, NC

ISSN

1530-0897

Print_ISBN

978-1-4244-2932-5

Type

conf

DOI

10.1109/HPCA.2009.4798236

Filename

4798236