Title :
Optimizing jobs timeouts on clusters and production grids
Author :
Glatard, T. ; Montagnat, Johan ; Pennec, X.
Author_Institution :
CNRS, Paris
Abstract :
This paper presents a method to optimize the timeout value of computing jobs. It relies on a model of the job execution time that considers the job management system latency through a random variable. It also takes into account a proportion of outliers to model either reliable clusters or production grids characterized by faults causing jobs loss. Job management systems are first studied considering classical distributions. Different behaviors are exhibited, depending on the weight of the tail of the distribution and on the amount of outliers. Experimental results are then shown based on the latency distribution and outlier ratios measured on the EGEE grid infrastructure1. Those results show that using the optimal timeout value provided by our method reduces the impact of outliers and leads to a 1.36 speed-up even for reliable systems without outliers.
Keywords :
grid computing; production engineering computing; workflow management software; EGEE grid infrastructure; computing jobs; job execution time; job management system latency; job timeouts; latency distribution; production grids; timeout value; Delay; Design optimization; Grid computing; Hardware; Job production systems; Large-scale systems; Optimization methods; Probability distribution; Random variables; Throughput;
Conference_Titel :
Cluster Computing and the Grid, 2007. CCGRID 2007. Seventh IEEE International Symposium on
Conference_Location :
Rio De Janeiro
Print_ISBN :
0-7695-2833-3
DOI :
10.1109/CCGRID.2007.78