Title :
Banking on Decoupling: Budget-Driven Sustainability for HPC Applications on EC2 Spot Instances
Author_Institution :
Comput. & Inf. Sci. Dept., Temple Univ., Philadelphia, PA, USA
Abstract :
Cloud providers are auctioning their excess capacity using dynamically priced virtual instances. These spot instances provide significant savings compared to on-demand or fixed price instances. The users willing to use these resources are asked to provide a maximum bid price per hour, and the cloud provider runs the instances as long as the market price is below the user´s bid price. By using such resources, the users are exposed explicitly to failures and need to adapt their applications to provide some level of fault tolerance. In this paper we expose the effect of bidding in the case of virtual HPC clusters composed of spot instances. We describe the interesting effect of uniform versus non-uniform bidding, in terms of failure rate and failure model. We propose an initial attempt to deal with the problem of predicting the runtime of a parallel application under various bidding strategies and various system parameters. We describe the relationship between bidding strategies and programming models. We build a preliminary optimization model that uses real price traces from Amazon Web Services as inputs, as well as instrumented values related to the processing and network capacities of clusters instances on the EC2 services. Our results show preliminary insights into the relationship between non-uniform bidding and application scaling strategies.
Keywords :
Web services; budgeting; fault tolerance; pricing; sustainable development; Amazon Web service; EC2 spot instances; HPC application; application scaling strategy; banking; bidding strategy; budget driven sustainability; cloud providers; decoupling; excess capacity; failure model; failure rate; fault tolerance; fixed price instance; maximum bid price; nonuniform bidding; on demand; parallel application; preliminary optimization model; priced virtual instance; programming model; virtual HPC clusters; Approximation methods; Computational modeling; Fault tolerance; Fault tolerant systems; Programming; Runtime; Timing; Auction-based cloud computing; Cloud virtual clusters; Cloud-based Fault Tolerance; Cost-aware Optimization models; Decoupling Parallel Programming Models; Spot Instances;
Conference_Titel :
Reliable Distributed Systems (SRDS), 2012 IEEE 31st Symposium on
Conference_Location :
Irvine, CA
Print_ISBN :
978-1-4673-2397-0
DOI :
10.1109/SRDS.2012.11