Title :
A Study on Job Co-Allocation in Multiple HPC Clusters
Author :
Qin, Jinhui ; Bauer, Michael
Author_Institution :
University of Western Ontario, Canada
Abstract :
To more effectively use HPC clusters for even larger computations, improve turn-around times and better utilize compute resource, users are looking to interconnect multiple HPC clusters, creating a grid. To effectively use such grids, it may be desirable to split and co-allocate jobs requiring many processes across multiple clusters. While splitting a very large job across multiple clusters is an attractive possibility, the benefit, in terms of improving turn-around time, ultimately depends on the communication patterns between processes, workload on the communication links, and the maximum bandwidth of the links. The objective of this work is to understand the impact of communications on multi-processor jobs in order to develop scheduling strategies and job allocation algorithms for multi-cluster grids which can accommodate communication factors. In this paper we report on initial investigations of some co-allocation strategies. This evaluation is based on a simulator that has been implemented and validated experimentally across two HPC clusters.
Keywords :
Bandwidth; Clustering algorithms; Computer networks; Computer science; Costs; Grid computing; High performance computing; Processor scheduling; Resource management; Scheduling algorithm;
Conference_Titel :
High-Performance Computing in an Advanced Collaborative Environment, 2006. HPCS 2006. 20th International Symposium on
Print_ISBN :
0-7695-2582-2
DOI :
10.1109/HPCS.2006.8