Abstract :
A fat-tree network is scalable, but the probability of network faults also increases with the fat-tree size. Faulttolerant routing in fat tree is thus particularly important. Instead of relying on a time-consuming centralized recovery process, a fast local rerouting algorithm is highly desirable. A deflection based fast local rerouting algorithm, which we call it Local Rerouting by Deflection (LRD), was recently proposed to handle the network fault locally and dynamically. However, in LRD, all rerouted packets need to follow an elongated recovery path, which incurs extra propagation delay and additional network congestion. Aiming at addressing the inefficiency of LRD, a new local rerouting algorithm, called Local Rerouting by Backtracking (LRB), is proposed in this paper. With LRB, the faultdetecting switch immediately reroutes the affected packets back to where they came from until a V-turn switch is found. A V-turn switch is a switch from where an alternate minimal path to the destination is found. We show that LRB can minimize packet loss, incur no penalty in path length (except the small amount of backtracked in-flight packets), and treat backtracked packets as implicit fault notification. To ensure efficient implementation of our LRB algorithm, a simple self-routing (instead of table lookup) mechanism is devised to simplify the switch operation. As compared with LRD, we show by simulations that our LRB can minimize the TCP performance degradation caused by network faults.
Keywords :
"Switches","Servers","Routing","Algorithm design and analysis","Topology","Table lookup","Fault tolerance"