Title :
16.1 A 340mV-to-0.9V 20.2Tb/s source-synchronous hybrid packet/circuit-switched 16×16 network-on-chip in 22nm tri-gate CMOS
Author :
Chen, Gang ; Anders, Mark A. ; Kaul, Himanshu ; Satpathy, Sudhir K. ; Mathew, Sanu K. ; Hsu, S.K. ; Agarwal, Abhishek ; Krishnamurthy, Ram K. ; Borkar, Shekhar ; De, Vivek
Author_Institution :
Intel, Hillsboro, OR, USA
Abstract :
Energy-efficient networks-on-chip (NoCs) are key enablers for exa-scale computation by shifting power budget from communication toward computation. As core counts scale into the 100s, on-chip interconnect fabrics must support increasing heterogeneity and voltage/clock domains. Synchronous NoCs require either a single clock distributed globally or clock-crossing data FIFOs between clock domains [1]. A global clock requires costly full-chip margining and significant power and area for clock distribution, while synchronizing data FIFOs add power, performance, and area overhead per clock crossing. Source-synchronous NoCs mitigate these penalties by forwarding a local clock along with each packet, but still suffer from high data storage power due to packet switching. Circuit switching removes intra-route data storage, but suffers from low network utilization due to serialized channel setup and data transfer [2]. Hybrid packet/circuit switching parallelizes these operations for higher network utilization. A 16×16 mesh, 112b data, 256 voltage/clock domain NoC with source-synchronous operation, hybrid packet/circuit-switched flow control, and ultra-low-voltage optimizations is fabricated in 22nm tri-gate CMOS [3] to enable: i) 20.2Tb/s total throughput at 0.9V, 25°C, ii) a 2.7× increase in bisection bandwidth to 2.8Tb/s and 93% reduction in circuit-switched latency at 407ps/hop through source-synchronous operation, iii) a 62% latency improvement and 55% increase in energy efficiency to 7.0Tb/s/W through circuit switching, iv) a peak energy efficiency of 18.3Tb/s/W for near-threshold operation at 430mV, 25°C, and v) ultra-low-voltage operation down to 340mV with router power scaling to 363μW.
Keywords :
CMOS integrated circuits; circuit switching; integrated circuit design; low-power electronics; network-on-chip; packet switching; synchronisation; bit rate 20.2 Tbit/s; circuit switching; clock-crossing data FIFO; energy-efficient networks-on-chip; exascale computation; global clock; hybrid packet-circuit-switched flow control; intraroute data storage; on-chip interconnect fabrics; packet switching; power 363 muW; size 22 nm; source-synchronous NoC; source-synchronous operation; temperature 25 C; trigate CMOS; ultralow-voltage optimizations; voltage 0.9 V; voltage 340 mV; voltage 430 mV; voltage-clock domains; Clocks; Data transfer; Delays; Energy efficiency; Ports (Computers); Synchronization; Throughput;
Conference_Titel :
Solid-State Circuits Conference Digest of Technical Papers (ISSCC), 2014 IEEE International
Conference_Location :
San Francisco, CA
Print_ISBN :
978-1-4799-0918-6
DOI :
10.1109/ISSCC.2014.6757432