DocumentCode
2529934
Title
Fault tolerance protocols for parallel programs based on tasks replication
Author
Aguilar, Jose ; Hernández, Marisela
Author_Institution
Dept. de Comput., CEMISID, Merida, Venezuela
fYear
2000
fDate
2000
Firstpage
397
Lastpage
404
Abstract
In this paper we propose a fault-tolerant mechanism for parallel programs based on task replication. We use a sequential discrete-event simulator of a distributed system subject to failures to compare a semi-active approach and a passive approach of the protocol. In our model, each time a task of a given parallel program is allocated, a copy of it is stored in a second processor, called the buddy processor. If the original processor fails, the copies of the tasks at the buddy processor will be processed, providing fault tolerance. Some performance measures, such as program execution times and processor utilization factors, are given for the different versions of the mechanism. The performance has been studied as a function of processor degradation, and program and system sizes
Keywords
discrete event simulation; fault tolerant computing; parallel programming; performance evaluation; protocols; buddy processor; distributed system; fault tolerance protocols; fault-tolerant mechanism; parallel programs; performance measures; processor degradation; program execution times; sequential discrete-event simulator; task replication; tasks replication; Application software; Computational modeling; Computer network reliability; Computer networks; Costs; Fault tolerance; Hardware; Protocols; Redundancy; Software performance;
fLanguage
English
Publisher
ieee
Conference_Titel
Modeling, Analysis and Simulation of Computer and Telecommunication Systems, 2000. Proceedings. 8th International Symposium on
Conference_Location
San Francisco, CA
ISSN
1526-7539
Print_ISBN
0-7695-0728-X
Type
conf
DOI
10.1109/MASCOT.2000.876564
Filename
876564
Link To Document