DocumentCode :
3549421
Title :
How resilient are distributed f fault/intrusion-tolerant systems?
Author :
Sousa, Paulo ; Neves, Nuno Ferreira ; Verìssimo, Paulo
Author_Institution :
Lisbon Univ., Portugal
fYear :
2005
fDate :
28 June-1 July 2005
Firstpage :
98
Lastpage :
107
Abstract :
Fault-tolerant protocols, asynchronous and synchronous alike, make stationary fault assumptions: only a fraction f of the total n nodes may fail. Whilst a synchronous protocol is expected to have a bounded execution time, an asynchronous one may execute for an arbitrary amount of time, possibly sufficient for f+1 nodes to fail. This can compromise the safety of the protocol and ultimately the safety of the system. Recent papers propose asynchronous protocols that can tolerate any number of faults over the lifetime of the system, provided that at most f nodes become faulty during a given interval. This is achieved through the so-called proactive recovery, which consists of periodically rejuvenating the system. Proactive recovery in asynchronous systems, though a major breakthrough, has some limitations which had not been identified before. In this paper, we introduce a system model expressive enough to represent these problems which remained in oblivion with the classical models. We introduce the predicate exhaustion-safe, meaning freedom from exhaustion-failures. Based on it, we predict the extent to which fault/intrusion-tolerant distributed systems (synchronous and asynchronous) can be made to work correctly. Namely, our model predicts the impossibility of guaranteeing correct behavior of asynchronous proactive recovery systems as exist today. To prove our point, we give an example of how these problems impact an existing fault/intrusion-tolerant distributed system, the CODEX system, and having identified the problem, we suggest one (certainly not the only) way to tackle it.
Keywords :
fault tolerant computing; security of data; system recovery; CODEX system; asynchronous protocol; asynchronous systems; bounded execution time; distributed f fault-tolerant systems; exhaustion-failure; fault-tolerant protocol; intrusion-tolerant distributed systems; predicate exhaustion-safe; proactive recovery; synchronous protocol; Algorithm design and analysis; Delay; Fault diagnosis; Fault tolerance; Fault tolerant systems; Informatics; Predictive models; Protocols; Safety; Timing;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Dependable Systems and Networks, 2005. DSN 2005. Proceedings. International Conference on
Print_ISBN :
0-7695-2282-3
Type :
conf
DOI :
10.1109/DSN.2005.55
Filename :
1467784
Link To Document :
بازگشت