Title :
An Evaluation of Behaviors of S-NUCA CMPs Running Scientific Workload
Author :
Foglia, Pierfrancesco ; Panicucci, Francesco ; Prete, Cosimo Antonio ; Solinas, Marco
Author_Institution :
Dept. of Inf. Eng., Univ. of Pisa, Pisa, Italy
Abstract :
Modern systems are able to put two or more processors on the same die (Chip Multiprocessors, CMP), each with its private caches, while the last level caches can be either private or shared. As these systems are affected by the wire delay problem, NUCA caches have been proposed to hide the effects of such delay in order to increase performance. A CMP system that adopt a NUCA as its shared last level cache has to be able to maintain coherence among the lowest, private levels of the cache hierarchy. As NUCA caches typically adopt a NoC as the communication infrastructure (in which the communication paradigm is message-passing), the coherence protocol has to be directory based, similar to the ones proposed for classical DSM systems. Previous works focusing on NUCA-based CMP systems adopt a fixed topology (i.e. physical position of cores and NUCA banks, and the communication infrastructure) each adopting different coherence strategies. In this paper, we present an evaluation of an 8-cpu CMP system with two levels of cache, in which the Lis are private of each core, while the L2 is a Static-NUCA shared among all cores. We considered two different system topologies (the first with the eight cpus connected to the NUCA at the same side, the second with half of the cpus on one side and the others at the opposite side), and for all the topologies we considered MES1 and MOES1. The results indicate that processor topology has much more effect on performance and NOC bandwidth utilization than the coherence protocol, as a consequence of data mapping and accesses´ distribution to the L2 cache that is not uniformly distributed to all the cache banks.
Keywords :
cache storage; message passing; multiprocessing systems; network-on-chip; MES1 topology; MOESI topology; NOC bandwidth utilization; NUCA caches; NoC; S-NUCA CMPs; access distribution; chip multiprocessors; coherence protocol; communication infrastructure; data mapping; fixed topology; message passing; nonuniform cache access architecture; private LI caches; processor topology; scientific workload; shared L2 cache; static NUCA; Bandwidth; Clocks; Delay effects; Design engineering; Design methodology; Digital systems; Network-on-a-chip; Protocols; Topology; Wire; NUCA; cache; latency; mapping; topology; wire-delay;
Conference_Titel :
Digital System Design, Architectures, Methods and Tools, 2009. DSD '09. 12th Euromicro Conference on
Conference_Location :
Patras
Print_ISBN :
978-0-7695-3782-5
DOI :
10.1109/DSD.2009.153