Title :
Spare capacity as a means of fault detection and diagnosis in multiprocessor systems
Author :
Dahbura, Anton T. ; Sabnani, Krishan K. ; Hery, William J.
Author_Institution :
AT&T Bell Labs., Murray Hill, NJ, USA
fDate :
6/1/1989 12:00:00 AM
Abstract :
A technique for detecting and diagnosing faults at the processor level in a multiprocessor system is described. A process is assigned whenever possible to two processors: the processor to which it would normally be assigned (primarily) and an additional processor that would otherwise be idle (secondary). Two strategies are described and analyzed: one that is preemptive and another that is nonpreemptive. It is shown that, for moderately loaded systems, a sufficient percentage of processes can be performed redundantly using the system´s spare capacity to provide a basis for fault detection and diagnosis with virtually no degradation of response time. A multiprocessor that uses the approach for detecting faults at the processor loads is described
Keywords :
fault tolerant computing; multiprocessing systems; redundancy; system recovery; detecting faults; diagnosis; fault detection; multiprocessor systems; nonpreemptive; preemptive; processor level; response time; spare capacity; Automatic generation control; Capacity planning; Costs; Decision trees; Fault detection; Fault diagnosis; Hardware; Logic design; Multiprocessing systems; System testing;
Journal_Title :
Computers, IEEE Transactions on