DocumentCode :
650670
Title :
Energy Efficient Fault Tolerance for High Performance Computing (HPC) in the Cloud
Author :
Egwutuoha, Ifeanyi P. ; Shiping Chen ; Levy, David ; Selic, Bran ; Calvo, Rodrigo
Author_Institution :
Sch. of Electr. & Inf. Eng., Univ. of Sydney, Sydney, NSW, Australia
fYear :
2013
fDate :
June 28 2013-July 3 2013
Firstpage :
762
Lastpage :
769
Abstract :
With cloud computing, a large number of Virtual Machines (VMs) can be provisioned to form high performance computing (HPC) to run computation-intensive applications using the Hardware as a Service (HaaS) model. Fault Tolerance (FT) for HPC in the cloud is increasingly a challenging issue, because any fault during the execution would result in re-running the application, which will cost time, money and energy. There has been a significant increase in energy consumption of HPC systems in cloud as a result of rerunning application and fault tolerance (e.g., redundant computing). In this paper we present energy efficient fault tolerance for HPC in the cloud. We develop a generic FT algorithm for HPC systems in the cloud. Our algorithm uses proactive processlevel migration approach, however it does not rely on a spare node or redundant computing prior to prediction of a failure. Our experimental results obtained from a real cloud execution environment show that the energy utilization for HPC in the cloud while providing fault tolerance can be reduced by as much as 30%.
Keywords :
cloud computing; energy conservation; energy consumption; parallel processing; power aware computing; software fault tolerance; FT algorithm; HPC systems; HaaS model; VM; cloud computing; cloud execution environment; computation-intensive applications; energy consumption; energy efficient fault tolerance; energy utilization; hardware as a service model; high performance computing; proactive process- level migration approach; virtual machines; Cloud computing; Fault tolerance; Fault tolerant systems; Hardware; Monitoring; Prediction algorithms; Temperature sensors; HPC; HaaS; cloud computing; computation-intensive applications; proactive fault tolerance; process-level migrations;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Cloud Computing (CLOUD), 2013 IEEE Sixth International Conference on
Conference_Location :
Santa Clara, CA
Print_ISBN :
978-0-7695-5028-2
Type :
conf
DOI :
10.1109/CLOUD.2013.69
Filename :
6676767
Link To Document :
بازگشت