DocumentCode :
2310868
Title :
Adaptive Checkpoint Replication for Supporting the Fault Tolerance of Applications in the Grid
Author :
Luckow, André ; Schnor, Bettina
Author_Institution :
Inst. of Comput. Sci., Univ. of Potsdam, Potsdam
fYear :
2008
fDate :
10-12 July 2008
Firstpage :
299
Lastpage :
306
Abstract :
A major challenge in a dynamic Grid with thousands of machines connected to each other is fault tolerance. The more resources and components involved, themore complicated and error-prone becomes the system. Migol is an adaptive Grid middleware, which addresses the fault tolerance of Grid applications and services by providing the capability to recover applications from checkpoint files automatically. A critical aspect for an automatic recovery is the availability of checkpoint files: If a resource becomes unavailable, it is very likely that the associated storage is also unreachable, e. g. due to a network partition. A strategy to increase the availability of checkpoints isreplication.In this paper, we present the Checkpoint Replication Service. A key feature of this service is the ability to automatically replicate and monitor checkpoints in the Grid.
Keywords :
checkpointing; grid computing; middleware; software fault tolerance; adaptive Grid middleware; adaptive checkpoint replication; checkpoint replication service; fault tolerance; Application software; Availability; Checkpointing; Computer applications; Computer networks; Fault tolerance; Humans; Libraries; Middleware; Resonance light scattering; Checkpointing; Grid Computing; Replication;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Network Computing and Applications, 2008. NCA '08. Seventh IEEE International Symposium on
Conference_Location :
Cambridge, MA
Print_ISBN :
978-0-7695-3192-2
Electronic_ISBN :
978-0-7695-3192-2
Type :
conf
DOI :
10.1109/NCA.2008.38
Filename :
4579677
Link To Document :
بازگشت