DocumentCode :
2522194
Title :
A problem-specific fault-tolerance mechanism for asynchronous, distributed systems
Author :
Iamnitchi, Adriana ; Foster, Ian
Author_Institution :
Dept. of Comput. Sci., Chicago Univ., IL, USA
fYear :
2000
fDate :
2000
Firstpage :
4
Lastpage :
13
Abstract :
The idle computers on a local area, campus area, or even wide area network represent a significant computational resource-one that is, however, also unreliable, heterogeneous, and opportunistic. We describe an algorithm that allows branch-and-bound problems to be solved in such environments. In designing this algorithm, we faced two challenges: (1) scalability, to effectively exploit the variably sized pools of resources available, and (2) fault tolerance, to ensure the reliability of services. We achieve scalability through a fully decentralized algorithm, in which the dynamically available resources are managed through a membership protocol. We guarantee fault tolerance in the sense that the loss of up to all but one resource will not affect the quality of the solution. For propagating information reliably, we use epidemic communication for both the membership protocol and the fault-tolerance mechanism. We have developed a simulation framework that allows us to evaluate design alternatives. Results obtained in this framework suggest that our techniques can execute scalably and reliably
Keywords :
distributed algorithms; fault tolerant computing; tree searching; branch-and-bound problems; computational resource; decentralized algorithm; distributed systems; fault tolerance; fault-tolerance mechanism; idle computers; membership protocol; reliability; Algorithm design and analysis; Computer networks; Computer science; Fault tolerance; Fault tolerant systems; Internet; Laboratories; Middleware; Protocols; Scalability;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Parallel Processing, 2000. Proceedings. 2000 International Conference on
Conference_Location :
Toronto, Ont.
ISSN :
0190-3918
Print_ISBN :
0-7695-0768-9
Type :
conf
DOI :
10.1109/ICPP.2000.876065
Filename :
876065
Link To Document :
بازگشت