DocumentCode :
3244834
Title :
An Evaluation of Communication Factors on an Adaptive Control Strategy for Job Co-allocation in Multiple HPC Clusters
Author :
Qin, Jinhui ; Bauer, Michael A.
Author_Institution :
Dept. of Comput. Sci., Univ. of Western Ontario, London, ON, Canada
fYear :
2009
fDate :
8-11 Dec. 2009
Firstpage :
391
Lastpage :
398
Abstract :
To more effectively use a network of high performance computing clusters, allocating multi-process jobs across multiple connected clusters, i.e., job co-allocation, offers the possibility of more efficient use of computer resources, reduced turn-around time and computations using numbers of processes larger than processors on any single cluster. Effective co-allocation, ultimately, depends on the inter-cluster communication cost. We previously introduced a scalable co-allocation strategy - maximum bandwidth adjacent cluster set (MBAS) strategy. It made use of two thresholds to control job co-allocation - one dealing with inter-cluster links and one controlling job partitioning. We subsequently introduced the adaptive threshold control system (ATCS), which used a fuzzy control approach to dynamically adjust these thresholds within MBAS. Results suggested that using ATCS during MBAS job co-allocation could achieve an overall performance improvement. However, these results only considered jobs that involved either master-slave or all-all communications among constituent processes. In this paper, we extend this analysis by also considering jobs that exhibit 2D-mesh communication patterns and evaluate ATCS further.
Keywords :
adaptive control; fuzzy control; telecommunication control; workstation clusters; 2D-mesh communication pattern; adaptive threshold control system; communication factor; fuzzy control; high performance computing; intercluster link; job coallocation; job partitioning; maximum bandwidth adjacent cluster set; multiple HPC cluster; scalable coallocation strategy; Adaptive control; Adaptive systems; Bandwidth; Communication system control; Computer networks; Control systems; Costs; High performance computing; Programmable control; Resource management; adaptive control; fuzzy control; high-performance computing clusters; job co-allocation; resource management;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Parallel and Distributed Systems (ICPADS), 2009 15th International Conference on
Conference_Location :
Shenzhen
ISSN :
1521-9097
Print_ISBN :
978-1-4244-5788-5
Type :
conf
DOI :
10.1109/ICPADS.2009.36
Filename :
5395301
Link To Document :
بازگشت