DocumentCode :
1820607
Title :
Design of an interconnect topology for multi-cores and scale-out workloads
Author :
Vidya, T. ; Ramasubramanian, N.
Author_Institution :
Dept. of Comput. Sci. & Eng., Nat. Inst. of Technol., Tiruchirapalli, India
fYear :
2015
fDate :
26-28 March 2015
Firstpage :
1
Lastpage :
5
Abstract :
Scale-out workloads are applications that are typically executed in a cloud environment and exhibit high level of request level parallelism. Such workloads benefit from processor organizations with very high core count since multiple requests can be serviced simultaneously by threads running on these cores. The characteristics of these workloads indicate that they have high instruction footprints exceeding the capacities of private caches, operate on large datasets with limited reuse and have minimal coherence activity due to lesser data sharing. The characteristics also indicate that the active instruction window can be captured by a Last Level Cache (LLC) size of 8MB. New processor organizations have been discussed in literature that tailors the interconnection among cores to match the communication pattern arising out of the characteristics of scale-out workloads. The focus of the current work is to take the approach of separating a core and LLC bank from a single tile as specified in literature and design a different interconnection topology for cores and LLC banks to reduce the latency of accessing the LLC to improve performance. In the current work, four cores and a LLC bank are designed to connect to a router forming a star topology and the routers (>4) are designed to form a 2D flattened butterfly topology. The current design has been targeted at 8 cores and has been implemented using the Bluespec System Verilog HDL (Hardware Description Language) and the design has been synthesized using Xilinx Vivado 2013.2 targeting Zynq®-7000 product family of FPGA boards. The design has been evaluated for different amounts of offered traffic and the average latency and the throughput of the interconnection network for uniform random traffic pattern has been calculated. An injection rate of 0.05packets/cycle/core which corresponds to the maximum L2 miss rate for scale-out workloads gives an average packet latency of 29.5 clock cycles and a through- ut of 0.52packets/cycle.
Keywords :
cache storage; hardware description languages; hypercube networks; multiprocessing systems; network topology; network-on-chip; 2D flattened butterfly topology; Bluespec system Verilog HDL; FPGA boards; LLC bank; LLC size; NoC; Xilinx Vivado 2013.2; Zynq-7000 product family; active instruction window; average packet latency; cloud environment; communication pattern; cores interconnection; data sharing; hardware description language; injection rate; instruction footprints; interconnect topology; interconnection network throughput; interconnection topology; large datasets; last level cache size; maximum L2 miss rate; multicores; network-on-chip; private caches; processor organizations; request level parallelism; scale-out workloads; threads; uniform random traffic pattern; Clocks; Delays; Multiprocessor interconnection; Network topology; Routing; Throughput; Topology; Flattened Butterfly; LLC; Multi-core; NoC; Scale-out workloads; Topology;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Signal Processing, Communication and Networking (ICSCN), 2015 3rd International Conference on
Conference_Location :
Chennai
Print_ISBN :
978-1-4673-6822-3
Type :
conf
DOI :
10.1109/ICSCN.2015.7219837
Filename :
7219837
Link To Document :
بازگشت