Title :
Algorithm-based error-detection schemes for iterative solution of partial differential equations
Author :
Roy-Chowdhury, Amber ; Bellas, Nikolas ; Banerjee, Prithviraj
Author_Institution :
Coordinated Sci. Lab., Illinois Univ., Urbana, IL, USA
fDate :
4/1/1996 12:00:00 AM
Abstract :
Algorithm-based fault tolerance is an inexpensive method of achieving fault tolerance without requiring any hardware modifications. For numerical applications involving the iterative solution of linear systems arising from discretization of various PDEs, there exist almost no fault-tolerant algorithms in the literature. We describe an error-detecting version of a parallel algorithm for iteratively solving the Laplace equation over a rectangular grid. This error-detecting algorithm is based on the popular successive overrelaxation scheme with red-black ordering. We use the Laplace equation merely as a vehicle for discussion; we show how to modify the algorithm to devise error-detecting iterative schemes for solving linear systems arising from discretizations of other PDEs, such as the Poisson equation and a variant of the Laplace equation with a mixed derivative term. We also discuss a modification of the basic scheme to handle situations where the underlying solution domain is not rectangular. We then discuss a somewhat different error-detecting algorithm for iterative solution of PDEs which can be expected to yield better error coverage. We also present a new way of dealing with the roundoff errors which complicate the check phase of algorithm-based schemes. Our approach is based on error analysis incorporating some simplifications and gives high fault coverage and no false alarms for a large variety of data sets. We report experimental results on the error coverage and performance overhead of our algorithm-based error-detection schemes on an Intel iPSC/2 hypercube multiprocessor
Keywords :
Laplace equations; error analysis; error detection; fault tolerant computing; iterative methods; mathematics computing; multiprocessing systems; parallel algorithms; partial differential equations; roundoff errors; Intel iPSC/2; Laplace equation; Poisson equation; discretization; error analysis; error-detecting iterative schemes; error-detection algorithm; fault tolerance; fault-tolerant algorithms; hardware modification; hypercube multiprocessor; iterative solution; linear systems; numerical applications; overrelaxation scheme; parallel algorithm; partial differential equations; performance overhead; rectangular grid; red-black ordering; roundoff errors; Fault tolerance; Fault tolerant systems; Hardware; Iterative algorithms; Laplace equations; Linear systems; Parallel algorithms; Poisson equations; Roundoff errors; Vehicles;
Journal_Title :
Computers, IEEE Transactions on