DocumentCode
2443230
Title
Analysis of Performance Dependencies in NUCA-Based CMP Systems
Author
Foglia, Pierfrancesco ; Panicucci, Francesco ; Prete, Cosimo Antonio ; Solinas, Marco
Author_Institution
Dipt. di Ing. dell´´Inf., Univ. di Pisa, Pisa, Italy
fYear
2009
fDate
28-31 Oct. 2009
Firstpage
49
Lastpage
56
Abstract
Improvements in semiconductor nanotechnology have continuously provided a crescent number of faster and smaller per-chip transistors. Consequent classical techniques for boosting performance, such as the increase of clock frequency and the amount of work performed at each clock cycle, can no longer deliver to significant improvement due to energy constrains and wire delay effects. As a consequence, designers interests have shifted toward the implementation of systems with multiple cores per chip (Chip Multiprocessors, CMP). CMP systems typically adopt a large last-level-cache (LLC) shared among all cores, and private L1 caches. As the miss resolution time for private caches depends on the response time of the LLC, which is wire-delay dominated, performance are affected by wire delay. NUCA caches have been proposed for single and multi core systems as a mechanism for such tolerating wire-delay effects on the overall performance. In this paper, we introduce our design for S-NUCA and D-NUCA cache memory systems, and we present an analysis of an 8-cpu CMP system with two levels of cache, in which the L1s are private, while the L2 is a NUCA shared among all cores. We considered two different system topologies (the first with the eight cpus connected to the NUCA at the same side -8p-, the second with half of the cpus on one side and the others at the opposite side -4+4p), and for all the configurations we evaluate the effectiveness of both the static and dynamic policies that have been proposed. Our results show that adopting a D-NUCA scheme with the 8p configuration is the best performing solution among all the considered configurations, and that for the 4+4p configuration the D-NUCA outperforms the S-NUCA in most of the cases. We highlight that performance are tied to both mapping strategy variations (Static and Dynamic) and topology changes. We also observe that bandwidth occupancy depends on both the NUCA policy and topology.
Keywords
cache storage; multiprocessing systems; performance evaluation; D-NUCA cache memory systems; NUCA-based CMP systems; S-NUCA cache memory systems; chip multiprocessors; clock frequency; dynamic NUCA; last-level-cache; multiple cores per chip systems; non-uniform cache access; performance dependency analysis; semiconductor nanotechnology; static NUCA; wire-delay effects; Boosting; Cache memory; Clocks; Delay effects; Energy resolution; Frequency; Nanotechnology; Performance analysis; Topology; Wire; NUCA; cache; coherence protocol; latency; wire delay;
fLanguage
English
Publisher
ieee
Conference_Titel
Computer Architecture and High Performance Computing, 2009. SBAC-PAD '09. 21st International Symposium on
Conference_Location
Sao Paulo
ISSN
1550-6533
Print_ISBN
978-0-7695-3857-0
Type
conf
DOI
10.1109/SBAC-PAD.2009.12
Filename
5336212
Link To Document