• DocumentCode
    327893
  • Title

    An experimental study about diskless checkpointing

  • Author

    Silva, Luis M. ; Silva, Joslo Gabriel

  • Author_Institution
    Dept. de Engenharia Inf., Coimbra Univ., Portugal
  • Volume
    1
  • fYear
    1998
  • fDate
    25-27 Aug 1998
  • Firstpage
    395
  • Abstract
    Checkpointing and rollback recovery is a very effective technique to tolerate the occurrence of failures. Usually, the checkpoint data is saved in some diskfiles. However, in some situations the disk operation may result in a considerable performance overhead. Alternative solutions would make use of main memory to maintain the checkpoint data. The paper presents two main memory check pointing schemes that can be used in any parallel machine without requiring any change to the hardware: one scheme saves the checkpoints in the memory of other processors, while the other is based on a parity approach. Both techniques have been implemented and evaluated in a commercial parallel machine. Some conclusions have been taken that clearly show the superiority of one of those schemes
  • Keywords
    fault tolerant computing; parallel machines; parallel programming; storage management; system recovery; checkpoint data; commercial parallel machine; disk operation; diskless checkpointing; experimental study; memory check pointing schemes; parity approach; performance overhead; rollback recovery; Checkpointing; Computer crashes; Fault tolerance; Hardware; Maintenance; Parallel machines; Random access memory; Read-write memory; Workstations; Writing;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Euromicro Conference, 1998. Proceedings. 24th
  • Conference_Location
    Vasteras
  • ISSN
    1089-6503
  • Print_ISBN
    0-8186-8646-4
  • Type

    conf

  • DOI
    10.1109/EURMIC.1998.711832
  • Filename
    711832