DocumentCode :
122384
Title :
Quasi Fat Trees for HPC Clouds and Their Fault-Resilient Closed-Form Routing
Author :
Zahavi, Eitan ; Keslassy, Isaac ; Kolodny, Avinoam
Author_Institution :
Mellanox, USA
fYear :
2014
fDate :
26-28 Aug. 2014
Firstpage :
41
Lastpage :
48
Abstract :
High-Performance Computing (HPC) Clusters and Data Center Networks often rely on fat-tree topologies. However, fat trees and their known variants are not designed for concurrent small jobs. As a result, in recent years, HPC designers have introduced ad-hoc topologies to offer better performance for these concurrent small jobs. In this paper, we present and formally define these topologies, which we call Quasi Fat Trees (QFTs). Specifically, we formulate the graph structure of these new topologies, and show that they perform better for concurrent small jobs. Furthermore, we derive a closed-form and fault-resilient contention-free routing algorithm for all global shift permutations. This routing optimizes the run-time of large computing jobs that utilize MPI collectives. Finally, we verify the algorithm by running its implementation as an OpenSM routing engine on various sizes of QFT topologies, and show that it exhibits good performance.
Keywords :
cloud computing; computer centres; parallel processing; software fault tolerance; topology; trees (mathematics); workstation clusters; DCN; HPC clouds; OpenSM routing engine; QFT topology; data center networks; fault-resilient closed-form routing; high-performance computing clusters; quasifat trees; Clustering algorithms; Joining processes; Network topology; Ports (Computers); Routing; Topology; Vegetation; Fat Tree; HPC; Routing; Topology;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
High-Performance Interconnects (HOTI), 2014 IEEE 22nd Annual Symposium on
Conference_Location :
Mountain View, CA
Type :
conf
DOI :
10.1109/HOTI.2014.19
Filename :
6925717
Link To Document :
بازگشت