Title :
DEFINE: a distributed fault injection and monitoring environment
Author :
Kao, Wei-lun ; Iyer, Ravishankar K.
Author_Institution :
Coordinated Sci. Lab., Illinois Univ., Urbana, IL, USA
Abstract :
This paper presents a distributed fault injection and monitoring environment (DEFINE) as a tool to evaluate system dependability, to investigate fault propagation, and to validate fault-tolerant mechanisms. DEFINE can inject both hardware faults (hardware-induced software errors) and software faults into any process running in a distributed system, either in user mode or in supervisor mode, and monitor the fault impact and propagation in software systems and among machines. It employs two fault injection techniques: (i) using hardware clock interrupts to control the time of fault injection and activation, and (ii) using software traps to inject all the faults except communication faults and memory faults in the data/stack segment. Experiments on six Sun SPARCstations to study the system behavior under faults are conducted to demonstrate the application of DEFINE
Keywords :
fault tolerant computing; interrupts; programming environments; software fault tolerance; DEFINE; Sun SPARCstations; communication faults; distributed fault injection and monitoring environment; fault injection techniques; fault propagation; fault-tolerant mechanisms; hardware clock interrupts; hardware faults; hardware-induced software errors; memory faults; software faults; software traps; system dependability; Clocks; Condition monitoring; Hardware; High performance computing; Packaging; Software packages; Software systems; Software tools; Sun; Workstations;
Conference_Titel :
Fault-Tolerant Parallel and Distributed Systems, 1994., Proceedings of IEEE Workshop on
Conference_Location :
College Station, TX
Print_ISBN :
0-8186-6807-5
DOI :
10.1109/FTPDS.1994.494497