مرکز منطقه ای اطلاع رساني علوم و فناوري - Efficient unicast and multicast support for CMPs

Abstract :

Beyond a certain number of cores, multi-core processing chips will require a network-on-chip (NoC) to interconnect the cores and overcome the limitations of a bus. NoCs must be carefully designed to meet constraints like power consumption, area, and ultra low latencies. Although 2D meshes with DOR (dimension-order-routing) meet these constraints, the need for partitioning (e.g. virtual machines, coherency domains) and traffic isolation may prevent the use of DOR routing. Also, core heterogeneity and manufacturing and run-time faults may lead to partially irregular topologies. Routing in these topologies is complex, and previously proposed solutions required routing tables, which drastically increase power consumption, area, and latency. The exception is LBDR (logic-based distributed routing), a flexible routing method for irregular topologies that removes the need for using routing tables (both at end-nodes and switches), thus achieving large savings in chip area and power consumption. But LBDR lacks support for multicast and broadcast, which are required to efficiently support cache coherence protocols both for single and multiple coherence domains. In this paper we propose bLBDR, an efficient multicast and broadcast mechanism built on top of LBDR. bLBDR performs multicast operations using a logic-based broadcast within a domain (a region with bounds). This allows us to isolate the traffic into different domains, thus enabling the concept of visualization at the NoC level. Also, bLBDR extends the concept of routing regions in LBDR by providing a mechanism that allows the flexible definition of multiple domains, sets of network resources. bLBDR fulfills all the practical requirements, including not only low latency and power and area efficiency, but also support for visualization, partitionability, fault-tolerance, traffic isolation and broadcast across the entire network as well as constrained to coherency domains or regions. All this is achieved by a small and pow- - er efficient routing logic (7times area savings and 17times power reduction when compared to a routing table in an 8 times 8 mesh network).

Keywords :

microprocessor chips; network topology; network-on-chip; power consumption; protocols; CMP; cache coherence protocols; chip multiprocessors; dimension-order-routing; logic-based distributed routing; multicore processing chips; network-on-chip; power consumption; routing tables; Broadcasting; Delay; Energy consumption; Multicast protocols; Network-on-a-chip; Routing; Telecommunication traffic; Topology; Unicast; Visualization;