Achieving High-Performance On-Chip Networks With Shared-Buffer Routers

Author

Tran, A.T. ; Baas, Bevan M.

Author_Institution

Electr. & Comput. Eng. (ECE) Dept., Univ. of California at Davis, Davis, CA, USA

Volume

22

Issue

6

fYear

2014

fDate

Jun-14

Firstpage

1391

Lastpage

1403

Abstract

On-chip routers typically have buffers dedicated to their input or output ports for temporarily storing packets in case contention occurs on output physical channels. Buffers, unfortunately, consume significant portions of router area and power budgets. While running a traffic trace, however, not all input ports of routers have incoming packets needed to be transferred simultaneously. Therefore, a large number of buffer queues in the network are empty and other queues are mostly busy. This observation motivates us to design router architecture with shared queues (RoShaQ), router architecture that maximizes buffer utilization by allowing the sharing multiple buffer queues among input ports. Sharing queues, in fact, makes using buffers more efficient hence is able to achieve higher throughput when the network load becomes heavy. On the other side, at light traffic load, our router achieves low latency by allowing packets to effectively bypass these shared queues. Experimental results on a 65-nm CMOS standard-cell process show that over synthetic traffics RoShaQ has 17% less latency and 18% higher saturation throughput than a typical virtualchannel (VC) router. Because of its higher performance, RoShaQ consumes 9% less energy per transferred packet than VC router given the same buffer space capacity. Over real multitask applications and E3S embedded benchmarks using near-optimal NMAP mapping algorithm, RoShaQ has 32% lower latency than VC router and targeting the same application throughput with 30% lower energy per packet.

Keywords

CMOS integrated circuits; buffer circuits; network routing; network-on-chip; CMOS standard-cell process; E3S embedded benchmarks; RoShaQ; buffer utilization; light traffic load; multiple buffer queues; multitask applications; near-optimal NMAP mapping; network load; on-chip networks; on-chip routers; power budgets; router architecture; router area; shared queues; shared-buffer routers; size 65 nm; synthetic traffics; Application mapping; networks on-chip; router architecture; shared-buffer; synthetic traffics; synthetic traffics.;

fLanguage

English

Journal_Title

Very Large Scale Integration (VLSI) Systems, IEEE Transactions on

Publisher

ieee

ISSN

1063-8210

Type

jour

DOI

10.1109/TVLSI.2013.2268548

Filename

6553191