Title :
Resilient workflows for high-performance simulation platforms
Author :
Nguyên, Toàn ; Trifan, Laurentiu ; Désidéri, Jean-Antoine
Author_Institution :
INRIA, St. Ismier, France
fDate :
June 28 2010-July 2 2010
Abstract :
Workflows systems are considered here to support large-scale multiphysics simulations. Because the use of large distributed and parallel multi-core infrastructures is prone to software and hardware failures, the paper addresses the need for error recovery procedures. A new mechanism based on asymmetric checkpointing is presented. A rule-based implementation for a distributed workflow platform is detailed.
Keywords :
Aerodynamics; Fault tolerance; Fault tolerant systems; Optimization; Resumes; Software; Synchronization; Fault-tolerant computing; Large-scale scientific computing; parallelization of simulation; workflow systems;
Conference_Titel :
High Performance Computing and Simulation (HPCS), 2010 International Conference on
Conference_Location :
Caen, France
Print_ISBN :
978-1-4244-6827-0
DOI :
10.1109/HPCS.2010.5547153