Title :
TCPT: thread criticality-driven prefetcher throttling
Author :
Yuan He ; Sasaki, Hiroshi ; Miwa, Shinobu ; Nakamura, Hiroshi
Author_Institution :
Univ. of Tokyo, Tokyo, Japan
Abstract :
The inevitable advent of the multi-core era has driven an increasing demand for low latency on-chip inter-connection networks (or NoCs). Being a critical part of the memory hierarchy for modern chip multi-processors (CMPs), these networks face stringent design constraints to provide fast communication with tight power budget. Modern NoC´s first-order concern is clearly its latency, while we also find that internal bandwidth of its routers is relatively plentiful; thus, we present a low latency router design utilizing a technique we call “multicast within a router” or McRouter, which allows productive utilization of remaining bandwidth inside a NoC router. McRouter allows a single cycle transfer of flits which shortens the communication latency when there is enough remaining bandwidth within the router. The key idea is to transmit a header flit to all possible output ports (multicast) so that it is always transmitted to the correct output port without relying on route computation. In addition, we find it is affordable with marginal power overhead while still being a stand-alone design by maintaining portability and modularity (unlike look-ahead routing based designs). Our evaluation with application traffic shows that McRouter helps achieving system speed-ups of 1.28, 1.17 and 1.05 over the conventional router (CR), the VSA router (VSAR) and the prediction router (PR), respectively.
Keywords :
integrated circuit design; memory architecture; multiprocessor interconnection networks; network routing; network-on-chip; performance evaluation; power aware computing; CMP; McRouter; NoC; bandwidth utilization; chip multiprocessors; communication latency; design constraints; high performance network-on-chips; low latency on-chip interconnection networks; low latency router design; memory hierarchy; modularity; multicast within a router technique; portability; power overhead; single cycle transfer; Computer architecture; Delays; Pipelines; Registers; Switches; memory systems; multi-core; prefetching; throttling;
Conference_Titel :
Parallel Architectures and Compilation Techniques (PACT), 2013 22nd International Conference on
Conference_Location :
Edinburgh
Print_ISBN :
978-1-4799-1018-2
DOI :
10.1109/PACT.2013.6618828