Title :
Using abstraction to improve fault tolerance
Author :
Castro, Miguel ; Rodrigues, Rodrigo ; Liskov, Barbara
Author_Institution :
Microsoft Res. Ltd, Cambridge, UK
Abstract :
Software errors are a major cause of outages and they are increasingly exploited in malicious attacks. Byzantine fault tolerance allows replicated systems to mask some software errors but it is expensive to deploy. The paper describes a replication technique, BFTA, which uses abstraction to reduce the cost of Byzantine fault tolerance and to improve its ability to mask software errors. BFTA reduces cost because it enables reuse of off-the-shelf service implementations. It improves availability because each replica can be repaired periodically using an abstract view of the state stored by correct replicas, and because each replica can run distinct or non-deterministic service implementations, which reduces the probability of common mode failures. We built an NFS service that allows each replica to run a different operating system. This example suggests that BFTA can be used in practice; the replicated file system required only a modest amount of new code, and preliminary performance results indicate that it performs comparably to the off-the-shelf implementations that it wraps.
Keywords :
computer crime; operating systems (computers); replicated databases; software fault tolerance; software libraries; software reusability; Byzantine fault tolerance; NFS service; abstract view; abstraction; common mode failures; malicious attacks; nondeterministic service implementations; operating system; performance results; probability; replicated file system; replicated systems; replication technique; software errors masking; Availability; Computer errors; Computer science; Costs; Fault tolerance; Fault tolerant systems; Laboratories; Operating systems; Software libraries; Software systems;
Conference_Titel :
Hot Topics in Operating Systems, 2001. Proceedings of the Eighth Workshop on
Print_ISBN :
0-7695-1040-X
DOI :
10.1109/HOTOS.2001.990057