DocumentCode
3355243
Title
Design for fault-tolerance in system ES model 900
Author
Spainhower, L. ; Isenberg, J. ; Chillarege, R. ; Berding, J.
Author_Institution
IBM Corp., Poughkeepsie, NY, USA
fYear
1992
fDate
8-10 July 1992
Firstpage
38
Lastpage
47
Abstract
The authors present the design for fault-tolerance in the IBM ES/9000 Model 900 high-end commercial processor. The design exploits circuit level concurrent-error detection, fault-identification, and reconfiguration with system level techniques when multiple functional resources are available. It provides true graceful degradation during central processor or channel reconfiguration and repair. The authors discuss the design point for this processor and the trade-offs involved; show the error detection and online repair process of a central processor with the work recovered on an alternate central processor, transparent to the application; describe dynamic path selection and the hot-pluggable channels; and illustrate the fault-tolerance techniques used in the level 1 cache and the central store.<>
Keywords
fault tolerant computing; multiprocessing systems; IBM; central processor; central store; circuit level concurrent-error detection; design point; dynamic path selection; fault-identification; fault-tolerance; high-end commercial processor; hot-pluggable channels; level 1 cache; multiple functional resources; online repair process; reconfiguration; system ES model 900; system level techniques; true graceful degradation; Buildings; Circuit faults; Electrical fault detection; Fault detection; Fault diagnosis; Fault tolerant systems; Hardware; Latches; Logic; Power system modeling;
fLanguage
English
Publisher
ieee
Conference_Titel
Fault-Tolerant Computing, 1992. FTCS-22. Digest of Papers., Twenty-Second International Symposium on
Conference_Location
Boston, MA, USA
Print_ISBN
0-8186-2875-8
Type
conf
DOI
10.1109/FTCS.1992.243617
Filename
243617
Link To Document