Title :
Design for fault-tolerance in system ES model 900
Author :
Spainhower, L. ; Isenberg, J. ; Chillarege, R. ; Berding, J.
Author_Institution :
IBM Corp., Poughkeepsie, NY, USA
Abstract :
The authors present the design for fault-tolerance in the IBM ES/9000 Model 900 high-end commercial processor. The design exploits circuit level concurrent-error detection, fault-identification, and reconfiguration with system level techniques when multiple functional resources are available. It provides true graceful degradation during central processor or channel reconfiguration and repair. The authors discuss the design point for this processor and the trade-offs involved; show the error detection and online repair process of a central processor with the work recovered on an alternate central processor, transparent to the application; describe dynamic path selection and the hot-pluggable channels; and illustrate the fault-tolerance techniques used in the level 1 cache and the central store.<>
Keywords :
fault tolerant computing; multiprocessing systems; IBM; central processor; central store; circuit level concurrent-error detection; design point; dynamic path selection; fault-identification; fault-tolerance; high-end commercial processor; hot-pluggable channels; level 1 cache; multiple functional resources; online repair process; reconfiguration; system ES model 900; system level techniques; true graceful degradation; Buildings; Circuit faults; Electrical fault detection; Fault detection; Fault diagnosis; Fault tolerant systems; Hardware; Latches; Logic; Power system modeling;
Conference_Titel :
Fault-Tolerant Computing, 1992. FTCS-22. Digest of Papers., Twenty-Second International Symposium on
Conference_Location :
Boston, MA, USA
Print_ISBN :
0-8186-2875-8
DOI :
10.1109/FTCS.1992.243617