DocumentCode
24072
Title
Achieving High-Performance On-Chip Networks With Shared-Buffer Routers
Author
Tran, A.T. ; Baas, Bevan M.
Author_Institution
Electr. & Comput. Eng. (ECE) Dept., Univ. of California at Davis, Davis, CA, USA
Volume
22
Issue
6
fYear
2014
fDate
Jun-14
Firstpage
1391
Lastpage
1403
Abstract
On-chip routers typically have buffers dedicated to their input or output ports for temporarily storing packets in case contention occurs on output physical channels. Buffers, unfortunately, consume significant portions of router area and power budgets. While running a traffic trace, however, not all input ports of routers have incoming packets needed to be transferred simultaneously. Therefore, a large number of buffer queues in the network are empty and other queues are mostly busy. This observation motivates us to design router architecture with shared queues (RoShaQ), router architecture that maximizes buffer utilization by allowing the sharing multiple buffer queues among input ports. Sharing queues, in fact, makes using buffers more efficient hence is able to achieve higher throughput when the network load becomes heavy. On the other side, at light traffic load, our router achieves low latency by allowing packets to effectively bypass these shared queues. Experimental results on a 65-nm CMOS standard-cell process show that over synthetic traffics RoShaQ has 17% less latency and 18% higher saturation throughput than a typical virtualchannel (VC) router. Because of its higher performance, RoShaQ consumes 9% less energy per transferred packet than VC router given the same buffer space capacity. Over real multitask applications and E3S embedded benchmarks using near-optimal NMAP mapping algorithm, RoShaQ has 32% lower latency than VC router and targeting the same application throughput with 30% lower energy per packet.
Keywords
CMOS integrated circuits; buffer circuits; network routing; network-on-chip; CMOS standard-cell process; E3S embedded benchmarks; RoShaQ; buffer utilization; light traffic load; multiple buffer queues; multitask applications; near-optimal NMAP mapping; network load; on-chip networks; on-chip routers; power budgets; router architecture; router area; shared queues; shared-buffer routers; size 65 nm; synthetic traffics; Application mapping; networks on-chip; router architecture; shared-buffer; synthetic traffics; synthetic traffics.;
fLanguage
English
Journal_Title
Very Large Scale Integration (VLSI) Systems, IEEE Transactions on
Publisher
ieee
ISSN
1063-8210
Type
jour
DOI
10.1109/TVLSI.2013.2268548
Filename
6553191
Link To Document