• DocumentCode
    2334596
  • Title

    Portable transparent checkpointing for distributed shared memory

  • Author

    Silva, Luis M. ; Silva, JoÃo Gabriel ; Chapple, Simon

  • Author_Institution
    Dept. de Engenharia Inf., Coimbra Univ., Portugal
  • fYear
    1996
  • fDate
    6-9 Aug. 1996
  • Firstpage
    422
  • Lastpage
    431
  • Abstract
    We present a checkpointing mechanism for a DSM system that, in spite of being invisible to the programmer, is quite efficient and portable. It is efficient because it is nonblocking, coordinated and thus domino-effect free. It offers some portability because it is built on top of MPI and uses only the services offered by MPI and a POSIX compliant local file system. As far as we know, this is the first real implementation of such a scheme for DSM. Along with the description of the algorithms used, we present experimental results obtained in a cluster of workstations, and discuss many insights that came out of the implementation effort. We hope that our research shows that efficient, transparent and portable checkpointing is viable for DSM systems.
  • Keywords
    Unix; distributed memory systems; message passing; parallel algorithms; shared memory systems; software portability; system recovery; MPI; Message Passing Interface; POSIX compliant local file system; distributed shared memory systems; domino-effect free; nonblocking mechanism; parallel algorithms; portable transparent checkpointing; workstation cluster; Checkpointing; Clustering algorithms; Computer crashes; Distributed computing; Fault tolerant systems; File systems; Parallel machines; Programming profession; Scalability; Workstations;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    High Performance Distributed Computing, 1996., Proceedings of 5th IEEE International Symposium on
  • Conference_Location
    Syracuse, NY, USA
  • ISSN
    1082-8907
  • Print_ISBN
    0-8186-7582-9
  • Type

    conf

  • DOI
    10.1109/HPDC.1996.546213
  • Filename
    546213