DocumentCode :
3636169
Title :
Using Cloud Constructs and Predictive Analysis to Enable Pre-Failure Process Migration in HPC Systems
Author :
James Brandt;Frank Chen;Vincent De Sapio;Ann Gentile;Jackson Mayo;Philippe Pébay;Diana Roe;David Thompson;Matthew Wong
Author_Institution :
Sandia Nat. Labs., Livermore, CA, USA
fYear :
2010
Firstpage :
703
Lastpage :
708
Abstract :
Accurate failure prediction in conjunction with efficient process migration facilities including some Cloud constructs can enable failure avoidance in large-scale high performance computing (HPC) platforms. In this work we demonstrate a prototype system that incorporates our probabilistic failure prediction system with virtualization mechanisms and techniques to provide a whole system approach to failure avoidance. This work utilizes a failure scenario based on a real-world HPC case study.
Keywords :
"Fault tolerance","Resource virtualization","Large-scale systems","High performance computing","Checkpointing","Cloud computing","Application software","Grid computing","Laboratories","Investments"
Publisher :
ieee
Conference_Titel :
Cluster, Cloud and Grid Computing (CCGrid), 2010 10th IEEE/ACM International Conference on
Print_ISBN :
978-1-4244-6987-1
Type :
conf
DOI :
10.1109/CCGRID.2010.31
Filename :
5493402
Link To Document :
بازگشت