DocumentCode :
3186483
Title :
Achieving dynamic workflow management system by applying provenance based checkpointing method
Author :
Kail, E. ; Kacsuk, P. ; Kozlovszky, M.
Author_Institution :
John von Neumann Fac. of Inf., Biotech Lab., Obuda Univ., Budapest, Hungary
fYear :
2015
fDate :
25-29 May 2015
Firstpage :
250
Lastpage :
253
Abstract :
Scientific workflows are data and compute intensive thus may run for days or even for weeks on parallel and distributed infrastructures such as HPC systems and cloud. In HPC environment the number of failures that can arise during scientific workflow enactment can be high so the use of fault tolerance techniques is unavoidable. The most frequently used fault tolerance techniques are job replication and checkpointing. While job replication is based on the assumption that the probability of single failures is much higher than of simultaneous failures, the checkpointing saves certain states and the execution can be restarted from that point later on. The effectiveness of the checkpointing method depends on the checkpointing interval. Common technique is to dynamically adapt the checkpointing interval. In this work we give a brief overview of the different checkpointing techniques and propose a new provenance based dynamic checkpointing method.
Keywords :
checkpointing; parallel processing; software fault tolerance; workflow management software; HPC systems; checkpointing interval; cloud; distributed infrastructures; dynamic workflow management system; fault tolerance techniques; job replication; parallel infrastructures; provenance based checkpointing method; Checkpointing; Computer crashes; Fault tolerance; Fault tolerant systems; Heuristic algorithms; Libraries; Synchronization;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Information and Communication Technology, Electronics and Microelectronics (MIPRO), 2015 38th International Convention on
Conference_Location :
Opatija
Type :
conf
DOI :
10.1109/MIPRO.2015.7160274
Filename :
7160274
Link To Document :
بازگشت