Title :
Selective checkpointing and rollbacks in multithreaded distributed systems
Author :
Kasbekar, Mangesh ; Das, Chita R.
Author_Institution :
Dept. of Comput. Sci. & Eng., Pennsylvania State Univ., University Park, PA, USA
Abstract :
Modern distributed systems are often multithreaded and object-oriented in their design. They require efficient techniques to checkpoint and restore their state for improving fault-tolerance properties. The traditional process-based techniques of distributed checkpointing and rollback algorithms suffer from the problem of false dependencies, which makes them very rigid and inefficient for use with modern systems. In this paper, we develop protocols that can selectively checkpoint (and rollback) some threads of a distributed system while leaving others untouched, and yet ensuring the consistency of state resulting from such a partial rollback
Keywords :
multi-threading; protocols; software fault tolerance; system recovery; distributed checkpointing; false dependencies; fault tolerance; multithreaded distributed systems; object-oriented design; partial rollback; process-based techniques; protocols; selective checkpointing; selective rollbacks; state consistency; state restoration; Checkpointing; Computer science; Design engineering; Fault tolerance; Fault tolerant systems; Modems; Multithreading; Programming profession; Protocols; Yarn;
Conference_Titel :
Distributed Computing Systems, 2001. 21st International Conference on.
Conference_Location :
Mesa, AZ
Print_ISBN :
0-7695-1077-9
DOI :
10.1109/ICDSC.2001.918931