DocumentCode :
3322504
Title :
Proactive Fault Tolerance Using Preemptive Migration
Author :
Engelmann, C. ; Vallee, G.R. ; Naughton, T. ; Scott, S.L.
Author_Institution :
Comput. Sci. & Math. Div., Oak Ridge Nat. Lab., Oak Ridge, TN
fYear :
2009
fDate :
18-20 Feb. 2009
Firstpage :
252
Lastpage :
257
Abstract :
Proactive fault tolerance (FT) in high-performance computing is a concept that prevents compute node failures from impacting running parallel applications by preemptively migrating application parts away from nodes that are about to fail. This paper provides a foundation for proactive FT by defining its architecture and classifying implementation options. This paper further relates prior work to the presented architecture and classification, and discusses the challenges ahead for needed supporting technologies.
Keywords :
fault tolerant computing; parallel processing; system recovery; high-performance computing; parallel application; preemptive migration; proactive fault tolerance architecture; system failure; Application software; Computer architecture; Computer networks; Concurrent computing; Condition monitoring; Degradation; Distributed computing; Fault tolerance; Laboratories; Resource management; fault tolerance; high-performance computing; preemptive migration;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Parallel, Distributed and Network-based Processing, 2009 17th Euromicro International Conference on
Conference_Location :
Weimar
ISSN :
1066-6192
Print_ISBN :
978-0-7695-3544-9
Type :
conf
DOI :
10.1109/PDP.2009.31
Filename :
4912941
Link To Document :
بازگشت