DocumentCode :
2573662
Title :
Algorithm-based fault-tolerant programming in scientific computation on multiprocessors
Author :
Altmann, J. ; Böhm, A.
Author_Institution :
IMMD III, Erlangen-Nurnberg Univ., Germany
fYear :
1995
fDate :
25-27 Jan 1995
Firstpage :
374
Lastpage :
382
Abstract :
Efficient parallel algorithms proposed to solve many fundamental problems in scientific computation are sensitive to processor failures. Because of its low costs, algorithm-based fault tolerance is an interesting concept for introducing fault tolerance into existing multiprocessors. To facilitate fault-tolerant programming in scientific computation, we have modified and developed further an existing parallel run-time environment. In this paper the aspect of tuning known error processing techniques to the algorithm-based approach is primarily examined. Design issues for implementation and execution time overhead of a fault-tolerant application in our run-time environment are studied. In contrast to many other environments for parallel fault-tolerant programming, which use the master/slave programming model, our environment enables one to add fault tolerance to existing parallel applications in scientific computation
Keywords :
multiprocessing systems; parallel algorithms; parallel programming; programming environments; software fault tolerance; algorithm-based fault-tolerant programming; error processing techniques; execution time overhead; master/slave programming model; multiprocessors; parallel algorithms; parallel run-time environment; scientific computation; Application software; Concurrent computing; Costs; Fault detection; Fault diagnosis; Fault tolerance; Fault tolerant systems; Hardware; Parallel programming; Runtime environment;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Parallel and Distributed Processing, 1995. Proceedings. Euromicro Workshop on
Conference_Location :
San Remo
Print_ISBN :
0-8186-7031-2
Type :
conf
DOI :
10.1109/EMPDP.1995.389185
Filename :
389185
Link To Document :
بازگشت