Title :
A Locality-Driven Atomic Group Checkpoint Protocol
Author :
Wei, Zunce ; Li, Hon F. ; Goswami, Dhrubajyoti
Author_Institution :
Dept. of Comput. Sci., Concordia Univ., Montreal, Que.
Abstract :
This paper explores the use of locality of dependencies in large-scale distributed systems towards developing efficient checkpoint strategies. Dependencies among processes evolve into message interactions, which often spread and affect recovery dependencies and logging requirements. On the other hand, message interactions are usually localized within small sub-regions formed in space and time. Aiming at both minimizing message logging and localizing recovery effect, we propose a strategy that forms group checkpoints around such regions and meanwhile selectively logs inter-region messages. A simple and efficient atomic group checkpoint (AGC) protocol is developed based on the locality information of a distributed computation, e.g., in agent communication protocol sessions in multi-agent systems. Atomicity guarantees consistency of group checkpoint and uniformity of group logging, and hence minimizes logging overhead. The correctness of the AGC protocol is analyzed and proved through a generic checkpoint dependency graph (CDG) model, which captures the recovery dependency relations among checkpoints
Keywords :
checkpointing; distributed processing; checkpoint dependency graph model; distributed computation; large-scale distributed systems; locality-driven atomic group checkpoint protocol; logging requirements; message interactions; message logging; recovery dependencies; Computer science; Concurrent computing; Delay; Distributed computing; Fault tolerant systems; Large-scale systems; Multiagent systems; Protocols; Runtime; Space technology;
Conference_Titel :
Parallel and Distributed Computing, Applications and Technologies, 2006. PDCAT '06. Seventh International Conference on
Conference_Location :
Taipei
Print_ISBN :
0-7695-2736-1
DOI :
10.1109/PDCAT.2006.11