DocumentCode :
1967196
Title :
Module Prototype for Online Failure Prediction for the IBM Blue Gene/L
Author :
Solano-Quinde, Lizandro D. ; Bode, Brett M.
Author_Institution :
Ames Lab, Scalable Comput. Lab., Iowa State Univ., Ames, IA
fYear :
2008
fDate :
18-20 May 2008
Firstpage :
470
Lastpage :
474
Abstract :
The growing complexity of scientific applications has led to the design and deployment of large-scale parallel systems. The IBM Blue Gene/L can hold in excess of 200 K processors and it has been designed for high performance and reliability. However, failures in this large-scale parallel system are a major concern, since it has been demonstrated that a failure will significantly reduce the performance of the system.
Keywords :
fault tolerant computing; parallel machines; system recovery; IBM Blue Gene/L; fault tolerance; large-scale parallel systems; online failure prediction; Checkpointing; Degradation; Fault tolerance; Fault tolerant systems; Information analysis; Large-scale systems; Pattern matching; Prototypes; Software prototyping; System performance; Blue Gene/L; Computer Fault Tolerance; Failure Analysis; Software Fault Tolerance;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Electro/Information Technology, 2008. EIT 2008. IEEE International Conference on
Conference_Location :
Ames, IA
Print_ISBN :
978-1-4244-2029-2
Electronic_ISBN :
978-1-4244-2030-8
Type :
conf
DOI :
10.1109/EIT.2008.4554349
Filename :
4554349
Link To Document :
بازگشت