DocumentCode :
2852481
Title :
Simulating Failures on Large-Scale Systems
Author :
Desai, Narayan ; Lusk, Ewing ; Buettner, Daniel ; Cherry, A. ; Voran, Theron
Author_Institution :
Math. & Comput. Sci. Div., Argonne Nat. Lab., Argonne, IL
fYear :
2008
fDate :
8-12 Sept. 2008
Firstpage :
103
Lastpage :
108
Abstract :
Developing fault management mechanisms is a difficult task because of the unpredictable nature of failures. In this paper, we present a fault simulation framework for Blue Gene/P systems implemented as a part of the Cobalt resource manager. The primary goal of this framework is to support system software development. We also present a hardware diagnostic system that we have implemented using this framework.
Keywords :
fault simulation; software fault tolerance; system recovery; Blue Gene/P systems; Cobalt resource manager; fault management; fault simulation; large-scale systems; Cobalt; Computational modeling; Computer science; Computer simulation; Hardware; Large-scale systems; Mesh networks; Parallel processing; Resource management; System software;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Parallel Processing - Workshops, 2008. ICPP-W '08. International Conference on
Conference_Location :
Portland, OR
ISSN :
1530-2016
Print_ISBN :
978-0-7695-3375-9
Electronic_ISBN :
1530-2016
Type :
conf
DOI :
10.1109/ICPP-W.2008.31
Filename :
4626787
Link To Document :
بازگشت