Title :
Reducing interprocessor dependence in recoverable distributed shared memory
Author :
Janssens, Bob ; Fuchs, W. Kent
Author_Institution :
Coordinated Sci. Lab., Illinois Univ., Urbana, IL, USA
Abstract :
Checkpointing techniques in parallel systems use dependency tracking and/or message logging to ensure that a system rolls back to a consistent state. Traditional dependency tracking in distributed shared memory (DSM) systems is expensive because of high communication frequency. In this paper we show that, if designed correctly, a DSM system only needs to consider dependencies due to the transfer of blocks of data, resulting in reduced dependency tracking overhead and reduced potential for rollback propagation. We develop an ownership timestamp scheme to tolerate the loss of block state information and develop a passive server model of execution where interactions between processors are considered atomic. With our scheme, dependencies are significantly reduced compared to the traditional message-passing model
Keywords :
distributed memory systems; fault tolerant computing; block state information; checkpointing techniques; dependency tracking; interprocessor dependence; message logging; ownership timestamp scheme; parallel systems; passive server model; recoverable distributed shared memory; rollback propagation; Application software; Checkpointing; Concurrent computing; Contracts; Frequency; Hardware; Laboratories; Message passing; NASA; Software systems;
Conference_Titel :
Reliable Distributed Systems, 1994. Proceedings., 13th Symposium on
Conference_Location :
Dana Point, CA
Print_ISBN :
0-8186-6575-0
DOI :
10.1109/RELDIS.1994.336911