Title :
The recovery language approach for software-implemented fault tolerance
Author :
De Florio, Vincenzo ; Deconinck, Geert ; Lauwereins, Rudy
Author_Institution :
Dept. of Electr. Eng., Katholieke Univ., Leuven, Belgium
Abstract :
We describe a novel approach for software-implemented fault tolerance that separates error detection from error recovery and offers a distinct programming and processing context for the latter. This allows the application developer to address separately the non-functional aspects of error recovery from those pertaining to the functional behaviour that the user application is supposed to have in the absence of faults. We conjecture that this way only a limited amount of non-functional code intrusion affects the user application, while the bulk of the strategy to cope with errors is to be expressed by the user in a “recovery script”, conceptually as well physically distinct from the functional application layer. Such script is to be written in what we call a “recovery language”, i.e. a specialised linguistic framework devoted to the management of the fault tolerance strategies that allows to express scenarios of isolation, reconfiguration, and recovery. These are to be executed on meta-entities of the application with physical or logical counterparts (processing nodes, tasks, or user-defined groups of tasks). The developer is therefore made able to modify the fault tolerance strategy with only a few or no modifications in the application part, or vice-versa, tackling more easily and effectively any of these two fronts. This can result in a better maintainability of the target fault-tolerant application and in support for reaching portability of the service while moving the application to different unfavourable environments. The paper positions and discusses the recovery language approach and a prototypal implementation for embedded applications developed within project TIRAN on a number of distributed platforms
Keywords :
error detection; software fault tolerance; software maintenance; system recovery; error detection; error recovery; fault tolerance strategy; functional behaviour; maintainability; meta-entities; prototypal implementation; recovery language approach; software-implemented fault tolerance; specialised linguistic framework; Application software; Electrical fault detection; Fault tolerance; Fault tolerant systems; Hardware; Libraries; Operating systems; Prototypes; Reflection; Runtime;
Conference_Titel :
Parallel and Distributed Processing, 2001. Proceedings. Ninth Euromicro Workshop on
Conference_Location :
Mantova
Print_ISBN :
0-7695-0987-8
DOI :
10.1109/EMPDP.2001.905070