Title :
Fault Tolerant Implementation of Peer-to-peer Distributed Iterative Algorithms
Author :
The Tung Nguyen ; El-Baz, Didier
Author_Institution :
LAAS, Toulouse, France
Abstract :
Fault tolerance issues related to the implementation of distributed iterative algorithms via the P2PDC peer-to-peer distributed computing environment are considered. P2PDC is a decentralized environment dedicated to task parallel applications. It has been designed more particularly for the solution of large scale numerical simulation problems via distributed iterative algorithms. The environment allows frequent and direct communications between peers i.e., machines. P2PDC is based on P2PSAP, a self-adaptive communication protocol. We present new functionalities of P2PDC aimed at making our environment more robust. An adaptive fault tolerance mechanism ensures the robustness of computation to cope with peer faults. We consider also fault tolerance from an algorithmic point of view: we concentrate in particular on distributed asynchronous iterative algorithms that can tolerate some message loss. A series of computational results is presented and analyzed for a numerical simulation problem.
Keywords :
iterative methods; parallel processing; peer-to-peer computing; protocols; software fault tolerance; P2PDC peer-to-peer distributed computing environment; P2PSAP; adaptive fault tolerance mechanism; decentralized environment; direct communications; distributed asynchronous iterative algorithms; fault tolerant implementation; file sharing; frequent communications; message loss; numerical simulation problems; parallel applications; peer-to-peer distributed iterative algorithms; peer-to-peer self-adaptive communication protocol; Checkpointing; Fault tolerance; Fault tolerant systems; Iterative methods; Peer to peer computing; Resource management; Topology; distributed computing; fault tolerance; numerical simulation; peer to peer computing; task parallel model;
Conference_Titel :
Computational Science and Engineering (CSE), 2012 IEEE 15th International Conference on
Conference_Location :
Nicosia
Print_ISBN :
978-1-4673-5165-2
Electronic_ISBN :
978-0-7695-4914-9
DOI :
10.1109/ICCSE.2012.103