DocumentCode :
3471533
Title :
OFAR-CM: Efficient Dragonfly Networks with Simple Congestion Management
Author :
Garcia, M.A. ; Vallejo, Enrique ; Beivide, Ramon ; Valero, M.R. ; Rodriguez, German
Author_Institution :
IBM Res., Zurich, Switzerland
fYear :
2013
fDate :
21-23 Aug. 2013
Firstpage :
55
Lastpage :
62
Abstract :
Dragonfly networks are appealing topologies for large-scale Data center and HPC networks, that provide high throughput with low diameter and moderate cost. However, they are prone to congestion under certain frequent traffic patterns that saturate specific network links. Adaptive non-minimal routing can be used to avoid such congestion. That kind of routing employs longer paths to circumvent local or global congested links. However, if a distance-based deadlock avoidance mechanism is employed, more Virtual Channels (VCs) are required, what increases design complexity and cost. OFAR (On-the-Fly Adaptive Routing) is a previously proposed routing that decouples VCs from deadlock avoidance, making local and global misrouting affordable. However, the severity of congestion with OFAR is higher, as it relies on an escape sub network with low bisection bandwidth. Additionally, OFAR allows for unlimited misroutings on the escape sub network, leading to unbounded paths in the network and long latencies. In this paper we propose and evaluate OFAR-CM, a variant of OFAR combined with a simple congestion management (CM) mechanism which only relies on local information, specifically the credit count of the output ports in the local router. With simple escape sub networks such as a Hamiltonian ring or a tree, OFAR outperforms former proposals with distance-based deadlock avoidance. Additionally, although long paths are allowed in theory, in practice packets arrive at their destination in a small number of hops. Altogether, OFAR-CM constitutes the first practicable mechanism to the date that supports both local and global misrouting in Dragonfly networks.
Keywords :
computer centres; interconnections; parallel processing; telecommunication congestion control; telecommunication network routing; telecommunication network topology; telecommunication traffic; CM; HPC networks; Hamiltonian ring; Hamiltonian tree; OFAR-CM; VC decoupling; adaptive nonminimal routing; bisection bandwidth; congestion management mechanism; congestion severity; deadlock avoidance; dragonfly networks; escape subnetwork; global congested links; global misrouting; large-scale datacenter; local congested links; local misrouting; local router; on-the-fly adaptive routing; output ports; traffic patterns; unbounded paths; unlimited misroutings; Electronic countermeasures; Network topology; Ports (Computers); Routing; System recovery; Throughput; Topology; Congestion Management; Deadlock Avoidance; Dragonfly Networks;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
High-Performance Interconnects (HOTI), 2013 IEEE 21st Annual Symposium on
Conference_Location :
San Jose, CA
Type :
conf
DOI :
10.1109/HOTI.2013.16
Filename :
6627736
Link To Document :
بازگشت