DocumentCode :
2267460
Title :
Towards Easy-to-Use Checkpointing of MPI Applications within CLUSTERIX
Author :
Czarnul, Pawel ; Urbaniak, Arkadiusz ; Fraczak, Marcin ; Dyczkowski, Maciej ; Balcerek, Bartlomiej
Author_Institution :
Gdansk University of Technology, Poland
fYear :
2004
fDate :
7-10 Sept. 2004
Firstpage :
390
Lastpage :
393
Abstract :
While there exist many kernel and user level libraries/systems which support checkpointing working processes and resuming their operations, it is still very difficult to provide an easy-to-use tool to assist checkpointing parallel applications. In this work, we aim at the development of an easy-to-use user-guided library to support checkpointing parallel MPI applications to be executed within the CLUSTERIX environment i.e. a collection of distributed HPC clusters. We propose a programmer-assisted approach with process state packing and unpacking at the code level for SPMD HPC applications. Although the library is in its early stage of development we present checkpoint/restart times and application execution (interrupted by checkpointing) times for the proposed approach compared to the same application linked with the ckpt user level library.
Keywords :
Checkpointing Parallel Applications; Parallel Software Environments; Process Checkpointing; Application software; Checkpointing; Computer architecture; Fault tolerance; Informatics; Kernel; Resumes; Signal processing; Sockets; Software libraries; Checkpointing Parallel Applications; Parallel Software Environments; Process Checkpointing;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Parallel Computing in Electrical Engineering, 2004. PARELEC 2004. International Conference on
Print_ISBN :
0-7695-2080-4
Type :
conf
DOI :
10.1109/PCEE.2004.72
Filename :
1376788
Link To Document :
بازگشت