Title :
Chameleon: a software infrastructure for adaptive fault tolerance
Author :
Bagchi, S. ; Whisnant, K. ; Kalbarczyk, Z. ; Iyer, R.K.
Author_Institution :
Center for Reliable & High Performance Comput., Illinois Univ., Urbana, IL, USA
Abstract :
This paper presents Chameleon, an adaptive software infrastructure for concurrently supporting different reliability levels in the same networked environment. Traditionally, fault tolerance has been provided through dedicated hardware, dedicated software, or a combination of both. Hardware solutions from manufacturers like Tandem have provided dedicated fault-tolerant machines with extensive hardware redundancy. Unfortunately, such solutions offer static levels of fault tolerance that remain fixed throughout the lifetime of the system. Software solutions, employed in distributed environments, involve replication of services in software to provide the requisite reliability level. However, to benefit from such solutions, applications need to be written with an intent to run in such an environment. Therefore, the benefits of such middleware go unnoticed to off-the-shelf applications. In contemporary networked computing systems, a broad range of commercial and scientific applications, with potentially varying reliability requirements, need to coexist. It is neither cost effective nor feasible to provide dedicated platforms for hardware-based fault tolerance for each application, or to rewrite each application to leverage off the specialized software middleware. We propose Chameleon as an infrastructure to provide adaptive levels of dependability to off-the-shelf applications with off-the-shelf unreliable hardware
Keywords :
fault tolerant computing; network operating systems; Chameleon; adaptive software infrastructure; fault tolerance; networked environment; reliability levels; Application software; Computer networks; Costs; Fault tolerance; Fault tolerant systems; Hardware; Manufacturing; Middleware; Redundancy; Software systems;
Conference_Titel :
Computer Performance and Dependability Symposium, 1998. IPDS '98. Proceedings. IEEE International
Conference_Location :
Durham, NC
Print_ISBN :
0-8186-8679-0
DOI :
10.1109/IPDS.1998.707734