DocumentCode :
3248356
Title :
Continuous adaptation for high performance throughput computing across distributed clusters
Author :
Walker, Edward
Author_Institution :
Texas Adv. Comput. Center, Univ. of Texas at Austin, Austin, TX
fYear :
2008
fDate :
Sept. 29 2008-Oct. 1 2008
Firstpage :
369
Lastpage :
375
Abstract :
A job proxy is an abstraction for provisioning CPU resources. This paper proposes an adaptive algorithm for allocating job proxies to distributed host clusters with the objective of improving large-scale job ensemble throughput. Specifically, the paper proposes a decision metric for selecting appropriate pending job proxies for migration between host clusters, and a self-synchronizing Paxos-style distributed consensus algorithm for performing the migration of these selected job proxies. The algorithm is further described in the context of a concrete application, the MyCluster system, which implements a framework for submitting, managing and adapting job proxies across distributed high performance computing (HPC) host clusters. To date, the system has been used to provision many hundreds of thousands of CPUs for computational experiments requiring high throughput on HPC infrastructures like the NSF TeraGrid. Experimental evaluation of the proposed algorithm shows significant improvement in user job throughput: an average of 8% in simulation, and 15% in a real-world experiment.
Keywords :
grid computing; workstation clusters; CPU resources; MyCluster system; TeraGrid; distributed high performance computing; distributed host clusters; job proxy; self-synchronizing Paxos-style distributed consensus; Adaptive algorithm; Clustering algorithms; Computational modeling; Concrete; Distributed computing; High performance computing; Large-scale systems; Processor scheduling; Software systems; Throughput;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Cluster Computing, 2008 IEEE International Conference on
Conference_Location :
Tsukuba
ISSN :
1552-5244
Print_ISBN :
978-1-4244-2639-3
Electronic_ISBN :
1552-5244
Type :
conf
DOI :
10.1109/CLUSTR.2008.4663797
Filename :
4663797
Link To Document :
بازگشت