DocumentCode :
3444153
Title :
Replicating statement execution for fault detection on distributed memory multiprocessors
Author :
Gong, Chun ; Melhem, Rami ; Gupta, Rajiv
Author_Institution :
Dept. of Comput. Sci., Pittsburgh Univ., PA, USA
fYear :
1994
fDate :
12-14 Jun 1994
Firstpage :
132
Lastpage :
141
Abstract :
A compiler-assisted methodology is proposed for fault detection on distributed-memory systems. Selected instances of program statements are replicated in a way that ensures appropriate coverage. Replication strategies for the detection of permanent and transient faults are presented. These strategies use idle processor times for replicating statement execution whenever possible. Two approaches are also discussed for implementing the proposed strategies on single-program multiple-data parallel execution platforms. The first approach replicates program statements through source-to-source program transformations while the second approach achieves the replication of program statements indirectly by replicating data on multiple processors
Keywords :
distributed memory systems; fault tolerant computing; compiler-assisted methodology; distributed memory multiprocessors; distributed-memory systems; fault detection; idle processor times; permanent faults; program statements; source-to-source program transformations; statement execution replication; transient faults; Computer science; Costs; Fault detection; Hardware; Multiprocessing systems; Processor scheduling; Program processors; Random access memory; Redundancy; VLIW;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Fault-Tolerant Parallel and Distributed Systems, 1994., Proceedings of IEEE Workshop on
Conference_Location :
College Station, TX
Print_ISBN :
0-8186-6807-5
Type :
conf
DOI :
10.1109/FTPDS.1994.494484
Filename :
494484
Link To Document :
بازگشت