• DocumentCode
    2931817
  • Title

    Combining Source Routing and Dynamic Fault Tolerance

  • Author

    Sem-Jacobsen, Frank Olaf ; Lysne, Olav ; Skeie, Tor

  • Author_Institution
    Networks & Distributed Syst., Simula Res. Lab., Lysaker
  • fYear
    2006
  • fDate
    Oct. 2006
  • Firstpage
    151
  • Lastpage
    158
  • Abstract
    An increasing amount of interconnect technologies rely on source routing to forward packets through the network. It is therefore important to develop methods for fault tolerance that are well suited for source routed networks. Dynamic fault tolerance allows the network to remain available through the occurrence of faults, as opposed to static fault tolerance which requires the network to be halted to reconfigure it. Source routing readily supports the source node choosing a different path when a fault occurs, but using this approach, packets already in the network will be lost. Local dynamic fault tolerance, where the packet is routed around the fault locally, would prevent much of the traffic being lost during failures, but this is cumbersome to achieve in source routed networks since packets encountering a fault will need to follow a path different from that encoded in the packet header. In this paper we present a mechanism to achieve local dynamic fault tolerance in source routed fat trees, a topology that has widespread use in supercomputer systems, and compare it with endpoint dynamic fault tolerance. We also show that by combining the two approaches we achieve performance superior to any of the two individually
  • Keywords
    computer networks; fault tolerant computing; telecommunication network routing; telecommunication network topology; trees (mathematics); endpoint dynamic fault tolerance; local dynamic fault tolerance; source routed fat trees; Fault tolerance; Fault tolerant systems; Laboratories; Multiprocessor interconnection networks; Network topology; Packet switching; Routing; Supercomputers; Switches; Telecommunication traffic;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Computer Architecture and High Performance Computing, 2006. SBAC-PAD '06. 18TH International Symposium on
  • Conference_Location
    Ouro Preto
  • ISSN
    1550-6533
  • Print_ISBN
    0-7695-2704-3
  • Type

    conf

  • DOI
    10.1109/SBAC-PAD.2006.12
  • Filename
    4032427