DocumentCode :
1886347
Title :
Compiler assisted fault detection for distributed-memory systems
Author :
Gong, Chun ; Melhem, Rani ; Gupta, Rajiv
Author_Institution :
Dept. of Comput. Sci., Pittsburgh Univ., PA, USA
fYear :
1994
fDate :
23-25 May 1994
Firstpage :
373
Lastpage :
380
Abstract :
Distributed-memory systems provide the most promising performance to cost ratio for multiprocessor computers due to their scalability. However the issues of fault detection and fault tolerance are critical in such systems since the probability of having faulty components increases with the number of processors. We propose a methodology for fault detection through compiler support. More specifically, we augment the single-program multiple-data (SPMD) execution model to duplicate selected data items in such a way that during execution, whenever a value of a duplicated data is computed, the owners of the data are tested. The proposed compiler assisted fault detection technique does not require any specialized hardware and allows for a selective choice of redundancy at compile time
Keywords :
computer debugging; distributed memory systems; fault tolerant computing; program compilers; reliability; software reliability; compile time; compiler assisted fault detection; data item duplication; distributed-memory systems; fault tolerance; multiprocessor computers; performance to cost ratio; probability; redundancy; scalability; single-program multiple-data execution model; specialized hardware; Computer science; Costs; Distributed computing; Fault detection; Fault tolerance; Fault tolerant systems; Hardware; Multiprocessing systems; Redundancy; Testing;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Scalable High-Performance Computing Conference, 1994., Proceedings of the
Conference_Location :
Knoxville, TN
Print_ISBN :
0-8186-5680-8
Type :
conf
DOI :
10.1109/SHPCC.1994.296667
Filename :
296667
Link To Document :
بازگشت