DocumentCode :
1796824
Title :
Single-cycle collective communication over a shared network fabric
Author :
Krishna, Tushar ; Li-Shiuan Peh
Author_Institution :
VSSAD, Intel Corp., Hudson, MA, USA
fYear :
2014
fDate :
17-19 Sept. 2014
Firstpage :
1
Lastpage :
8
Abstract :
In the multicore era, on-chip network latency and throughput have a direct impact on system performance. A highly important class of communication flows traversing the network is collective, i.e., one-to-many and many-to-one. Scalable coherence protocols often leverage imprecise tracking to lower the overhead of directory storage, in turn leading to more collective communications on-chip. Routers with support for message forking/aggregation have been previously demonstrated, supporting such protocols. However, even with the fastest possible designs today (1-cycle routers), collective flows on a k×k mesh still incur delays proportional to k since all communication is across the entire chip. As k increases across technology generations, the latency of these flows will also go up. However, the pure wire delay to cross the chip is just 1-2 cycles today, and is expected to remain roughly invariant. The dependence of message delays on k arises due to the requirement to latch messages at every router. In this work, we remove this requirement.We design a network fabric that enables messages to (1) dynamically create virtual 1-to-Many (multicast) and Many-to-1 (reduction) tree routes over a physical mesh, (2) get forked/aggregated at nodes on the tree, and (3) traverse the tree - all within a single-cycle across each dimension. For synthetic 1-to-Many/Many-to-1 flows, we demonstrate 76/82% reduction in latency, and 1.6/2X improvement in throughput over a state-of-the-art NoC with 1-cycle routers and support for collective communication. Across a suite of SPLASH-2 and PARSEC benchmarks, full-system runtime and energy is reduced by 14% and 50% for a limited-directory protocol.
Keywords :
microprocessor chips; multicast protocols; network routing; network-on-chip; NoC design; PARSEC benchmarks; SPLASH-2 benchmarks; communications on-chip; latch messages; on-chip network; scalable coherence protocols; shared network fabric; single-cycle collective communication; Delays; Routing protocols; Switches; Throughput; Unicast; Wires;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Networks-on-Chip (NoCS), 2014 Eighth IEEE/ACM International Symposium on
Conference_Location :
Ferrara
Type :
conf
DOI :
10.1109/NOCS.2014.7008755
Filename :
7008755
Link To Document :
بازگشت