Author :
Cappello, Franck
Author_Institution :
INRIA, Univ. of Illinois at Urbana-Champaign, Urbana, IL, USA
Abstract :
Summary form only given. In this talk, we will explore some recent results concern ing the execution of MPI applications on unstable environments. We will show that by extracting the fundamental characteristics of HPC application, we can design new fault tolerance approaches surpassing existing approaches. In particular, we will present a characterization of HPC applications and the design of a new family of fault tolerance protocols mixing the benefit of coordinated checkpointing and message logging protocols.
Keywords :
cloud computing; fault tolerant computing; message passing; MPI applications; cloud environment; coordinated checkpointing; exascale environment; fault tolerance protocols; high performance computing; hostile environments; message logging; unstable environments;
Conference_Titel :
Parallel and Distributed Processing Workshops and Phd Forum (IPDPSW), 2011 IEEE International Symposium on
Conference_Location :
Shanghai
Print_ISBN :
978-1-61284-425-1
DOI :
10.1109/IPDPS.2011.410