DocumentCode :
2978499
Title :
Discovery and Routing of Degraded Fat-Trees
Author :
Bogdanski, Bartosz ; Johnsen, Bjorn Dag ; Reinemo, Sven-Arne ; Sem-Jacobsen, Frank Olaf
Author_Institution :
Oracle Corp., Oslo, Norway
fYear :
2012
fDate :
14-16 Dec. 2012
Firstpage :
697
Lastpage :
702
Abstract :
The fat-tree topology has become a popular choice for InfiniBand enterprise systems due to its deadlock freedom, fault-tolerance and full bisection bandwidth. In the HPC domain, InfiniBand fabric is used in almost 42% of the systems on the latest Top 500 list, and many of those systems are based on the fat-tree topology. Despite the popularity of the fat-tree topology, little research has been done to compare the behavior of InfiniBand routing algorithms on degraded fat-tree topologies. In this paper, we identify the weaknesses of the current fat-tree routing and propose enhancements that liberalize the restrictions imposed on the routed fabric. Furthermore, we present a thorough analysis of non-proprietary routing algorithms that are implemented in the InfiniBand Open Subnet Manager. Our results show that even though the performance of a fat-tree routed network deteriorates predictably with the number of failed links, fat-tree routing algorithm is still the best choice for severely degraded fat-tree fabrics.
Keywords :
computer network performance evaluation; fault tolerance; field buses; telecommunication links; telecommunication network routing; telecommunication network topology; HPC domain; InfiniBand enterprise systems; InfiniBand fabric; InfiniBand open subnet manager; InfiniBand routing algorithms; bisection bandwidth; deadlock freedom; degraded fat-tree discovery; degraded fat-tree fabrics; degraded fat-tree routing; fat-tree routed network performance; fat-tree topology; fault-tolerance; link failure; nonproprietary routing algorithms; Fabrics; Network topology; Ports (Computers); Routing; Switches; System recovery; Topology; InfiniBand; fat-tree; fault-tolerance; routing algorithms;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Parallel and Distributed Computing, Applications and Technologies (PDCAT), 2012 13th International Conference on
Conference_Location :
Beijing
Print_ISBN :
978-0-7695-4879-1
Type :
conf
DOI :
10.1109/PDCAT.2012.67
Filename :
6589362
Link To Document :
بازگشت