DocumentCode
3179318
Title
A flexible clustered approach to high availability
Author
Hughes-Fenchel, G.
Author_Institution
Lucent Technol., Murray Hill, NJ, USA
fYear
1997
fDate
24-27 June 1997
Firstpage
314
Lastpage
318
Abstract
The Reliable Clustered Computing project created a system which enables applications to improve the reliability of off the shelf computers from a typical 99% (about 90 hours of downtime per year) to 99.99% (under one hour of downtime per year) in a cost-effective manner. The chief constraints were the need to achieve high reliability while minimizing cost and maintaining vendor independence. This was realized by creating a vendor independent clustered configuration comprised of two or more computers capable of recovering from hardware or software errors by restarting one or more processes on the current machine or by failing over one or more processes to another machine. Only two inexpensive custom hardware components were required for this solution: a WatchDog, to monitor component status, and a PowerDog, to control electrical power to processing elements (and optional peripherals). The bulk of the functionality was provided by software.
Keywords
fault tolerant computing; reliability; system recovery; PowerDog; Reliable Clustered Computing; WatchDog; clustered configuration; component status; electrical power; high availability; high reliability; off the shelf computers; reliability; Application software; Availability; Computer industry; Condition monitoring; Fault detection; Hardware; Maintenance; Telecommunication computing; Virtual machine monitors; Virtual machining;
fLanguage
English
Publisher
ieee
Conference_Titel
Fault-Tolerant Computing, 1997. FTCS-27. Digest of Papers., Twenty-Seventh Annual International Symposium on
Conference_Location
Seattle, WA, USA
ISSN
0731-3071
Print_ISBN
0-8186-7831-3
Type
conf
DOI
10.1109/FTCS.1997.614105
Filename
614105
Link To Document