Title :
Rewiring 2 Links Is Enough: Accelerating Failure Recovery in Production Data Center Networks
Author :
Guo Chen ; Youjian Zhao ; Dan Pei ; Dan Li
Author_Institution :
Dept. of Comput. Sci. & Technol., Tsinghua Univ., Beijing, China
fDate :
June 29 2015-July 2 2015
Abstract :
Failures are not uncommon in production data center networks (DCNs) nowadays, and it takes long time for the network to recover from a failure and find new forwarding paths, significantly impacting real time and interactive applications at the upper layer. The slow failure recovery is due to two primary reasons. First, there lacks immediate backup paths for downward links in DCN with multi-rooted tree topology. Second, distributed routing protocols in DCN take time to converge after failures. In this paper, we present a fault-tolerant DCN solution, called F2Tree, that can significantly improve the failure recovery time in current DCNs, only through a small amount of link rewiring and switch configuration changes. Because F2Tree does not change any existing software or hardware, it is readily deployed in production DCNs, where other existing proposals fail to achieve. Through testbed and emulation experiments, we show that F2Tree can greatly reduce the time of failure recovery by 78%. Our experimental results also show that, for partition-aggregate applications (popular in DCN) under various failure conditions, F2Tree reduces the ratio of deadline-missing requests by more than 96% compared to current DCNs.
Keywords :
computer centres; failure analysis; fault tolerant computing; real-time systems; F2Tree; backup paths; downward links; failure recovery acceleration; fault-tolerant DCN solution; forwarding paths; interactive applications; multirooted tree topology; partition-aggregate applications; production data center networks; realtime applications; Ports (Computers); Production; Redundancy; Routing; Routing protocols; Switches; Topology; Data center networks; Failure recovery;
Conference_Titel :
Distributed Computing Systems (ICDCS), 2015 IEEE 35th International Conference on
Conference_Location :
Columbus, OH
DOI :
10.1109/ICDCS.2015.64