Title :
Fault-Tolerant Routing for Exascale Supercomputer: The BXI Routing Architecture
Author :
Vignéras;Jean-Noël
Abstract :
BXI, Bull eXascale Interconnect, is the new inter-connection network developed by Atos for High Performance Computing. It has been designed to meet the requirements of exascale supercomputers. At such scale, faults have to be expected and dealt with transparently so that applications remain unaffected by them. BXI features various mechanisms for this purpose, one of which is the BXI routing component presented in this paper. The BXI routing module computes the full routing tables for a 64k nodes fat-tree in a few minutes. But with partial re-computation it can withstand numerous inter-router link failures without any noticeable impact on running applications.
Keywords :
"Routing","Switches","Topology","System recovery","Ports (Computers)","Algorithm design and analysis","Computer architecture"
Conference_Titel :
Cluster Computing (CLUSTER), 2015 IEEE International Conference on
DOI :
10.1109/CLUSTER.2015.135