Title :
Re-NUCA: Boosting CMP Performance Through Block Replication
Author :
Foglia, Pierfrancesco ; Prete, Cosimo Antonio ; Solinas, Marco ; Monni, Giovanna
Author_Institution :
Dipt. di Ing. dell´´Inf., Univ. di Pisa, Pisa, Italy
Abstract :
Chip Multiprocessor (CMP) systems have become the reference architecture for designing micro-processors, thanks to the improvements in semiconductor nanotechnology that have continuously provided a crescent number of faster and smaller per-chip transistors. The interests for CMPs grew up since classical techniques for boosting performance, e.g. the increase of clock frequency and the amount of work performed at each clock cycle, can no longer deliver to significant improvement due to energy constrains and wire delay effects. CMP systems generally adopt a large last-level-cache (LLC) (typically, L2 or L3) shared among all cores, and private L1 caches. As the miss resolution time for private caches depends on the response time of the LLC, which is wire-delay dominated, performance are affected by wire delay. NUCA caches have been proposed for single and multi core systems as a mechanism for tolerating wire-delay effects on the overall performance. In this paper, we introduce a novel NUCA architecture, called Re-NUCA, specifically suited for (but not limited to) CMPs in which cores are placed at different sides of the shared cache. The idea is to allow shared blocks to be replicated inside the shared cache, in order to avoid the limitations to performance improvements that arise in classical D-NUCA caches due to the conflict hit problem. Our results show that Re-NUCA outperforms D-NUCA of more then 5% on average, but for those applications that strongly suffer from the conflict hit problem we observe performance improvements up to 15%.
Keywords :
cache storage; microprocessor chips; multiprocessing systems; CMP performance; Re-NUCA architecture; block replication; boosting performance; chip multiprocessor; last-level-cache; microprocessor design; miss resolution time; nonuniform cache access; per-chip transistors; private L1 caches; private caches; semiconductor nanotechnology; wire-delay effects; Coherence; Oceans; Program processors; Protocols; Receivers; Technical Activities Guide - TAG; Wire; CMP systems; NUCA; block replication; cache memory;
Conference_Titel :
Digital System Design: Architectures, Methods and Tools (DSD), 2010 13th Euromicro Conference on
Conference_Location :
Lille
Print_ISBN :
978-1-4244-7839-2
DOI :
10.1109/DSD.2010.41