DocumentCode :
668173
Title :
Design of network topology aware scheduling services for large InfiniBand clusters
Author :
Subramoni, Hari ; Bureddy, D. ; Kandalla, Krishna ; Schulz, K. ; Barth, B. ; Perkins, J. ; Arnold, Martin ; Panda, Dhabaleswar K.
Author_Institution :
Dept. of Comput. Sci. & Eng., Ohio State Univ., Columbus, OH, USA
fYear :
2013
fDate :
23-27 Sept. 2013
Firstpage :
1
Lastpage :
8
Abstract :
The goal of any scheduler is to satisfy user´s demands for computation and achieve a good performance in overall system utilization by efficiently assigning jobs to resources. However, the current state-of-the-art scheduling techniques do not intelligently balance node allocation based on the total bandwidth available between switches - that leads to over subscription. Additionally, poor placement of processes can lead to network congestion and poor performance. In this paper, we explore the design of a network-topology-aware plugin for the SLURM job scheduler for modern InfiniBand-based clusters. We present designs to enhance the performance of applications with varying communication characteristics. Through our techniques, we are able to considerably reduce the amount of network contention observed during the Alltoall / FFT operations. The results of our experimental evaluation indicate that our proposed technique is able to deliver up to a 9% improvement in the communication time of P3DFFT at 512 processes. We also see that our techniques are able to increase the performance of microbenchmarks that rely on point-to-point operations up to 40% for all message sizes. Our techniques were also able to improve the throughput of a 512-core cluster by up to 8%.
Keywords :
computer network performance evaluation; resource allocation; scheduling; telecommunication congestion control; telecommunication network topology; telecommunication switching; Alltoall operations; FFT operations; InfiniBand-based clusters; P3DFFT; SLURM job scheduler; cluster resource management; communication time; network congestion; network contention; network topology aware scheduling services; network-topology-aware plugin; overall system utilization; performance enhancement; throughput improvement; Schedules; Switches; Three-dimensional displays; Topology; Cluster Technology; Cluster resource management; InfiniBand; Network Topology; Scheduling;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Cluster Computing (CLUSTER), 2013 IEEE International Conference on
Conference_Location :
Indianapolis, IN
Type :
conf
DOI :
10.1109/CLUSTER.2013.6702677
Filename :
6702677
Link To Document :
بازگشت