Title :
Fair share on high performance computing systems: what does fair really mean?
Author :
Kleban, Stephen D. ; Clearwater, Scott H.
Author_Institution :
Sandia Nat. Labs., Albuquerque, NM, USA
Abstract :
We report on a performance evaluation of a Fair Share system at the ASCI Blue Mountain supercomputer cluster. We study the impacts of share allocation under Fair Share on wait times and expansion factor. We also measure the Service Ratio, a typical figure of merit for Fair Share systems, with respect to a number of job parameters. We conclude that Fair Share does little to alter important performance metrics such as expansion factor. This leads to the question of what Fair Share means on cluster machines. The essential difference between Fair Share on a uni-processor and a cluster is that the workload on a cluster is not fungible in space or time. We find that cluster machines must be highly utilized and support checkpointing in order for Fair Share to function more closely to the spirit in which it was originally developed.
Keywords :
mainframes; performance evaluation; processor scheduling; resource allocation; Fair Share system; cluster machines; expansion factor; performance evaluation; performance metrics; share allocation; supercomputer cluster; Checkpointing; Computational modeling; Grid computing; High performance computing; Laboratories; Measurement; Physics computing; Predictive models; Resource management; Supercomputers;
Conference_Titel :
Cluster Computing and the Grid, 2003. Proceedings. CCGrid 2003. 3rd IEEE/ACM International Symposium on
Print_ISBN :
0-7695-1919-9
DOI :
10.1109/CCGRID.2003.1199363