DocumentCode
2334596
Title
Portable transparent checkpointing for distributed shared memory
Author
Silva, Luis M. ; Silva, JoÃo Gabriel ; Chapple, Simon
Author_Institution
Dept. de Engenharia Inf., Coimbra Univ., Portugal
fYear
1996
fDate
6-9 Aug. 1996
Firstpage
422
Lastpage
431
Abstract
We present a checkpointing mechanism for a DSM system that, in spite of being invisible to the programmer, is quite efficient and portable. It is efficient because it is nonblocking, coordinated and thus domino-effect free. It offers some portability because it is built on top of MPI and uses only the services offered by MPI and a POSIX compliant local file system. As far as we know, this is the first real implementation of such a scheme for DSM. Along with the description of the algorithms used, we present experimental results obtained in a cluster of workstations, and discuss many insights that came out of the implementation effort. We hope that our research shows that efficient, transparent and portable checkpointing is viable for DSM systems.
Keywords
Unix; distributed memory systems; message passing; parallel algorithms; shared memory systems; software portability; system recovery; MPI; Message Passing Interface; POSIX compliant local file system; distributed shared memory systems; domino-effect free; nonblocking mechanism; parallel algorithms; portable transparent checkpointing; workstation cluster; Checkpointing; Clustering algorithms; Computer crashes; Distributed computing; Fault tolerant systems; File systems; Parallel machines; Programming profession; Scalability; Workstations;
fLanguage
English
Publisher
ieee
Conference_Titel
High Performance Distributed Computing, 1996., Proceedings of 5th IEEE International Symposium on
Conference_Location
Syracuse, NY, USA
ISSN
1082-8907
Print_ISBN
0-8186-7582-9
Type
conf
DOI
10.1109/HPDC.1996.546213
Filename
546213
Link To Document