Abstract :
Notice of Violation of IEEE Publication Principles
"Design, Analysis and Performance Evaluation of a New Algorithm for Developing a Fault Tolerant Distributed System"
by Umasankar Malladi
in the Proceedings of the 12th International Conference on Parallel and Distributed Systems
After careful and considered review of the content and authorship of this paper by a duly constituted expert committee, this paper has been found to be in violation of IEEE\´s Publication Principles.
This paper contains significant duplication of original text from the paper cited below. The original text was copied without attribution (including appropriate references to the original author(s) and/or paper title) and without permission.
Due to the nature of this violation, reasonable effort should be made to remove all past references to this paper, and future references should be made to the following article:
"Design, Analysis and Performance Evaluation of a New Algorithm for Developing a Fault Tolerant Distributed System"
by Ch.D.V. Subba Rao (Original Author)
in Technical Report CS-SWlab-2004-05/03, Department of Computer Science and
Engineering, Sri Venkateswara University College of Engineering, Tirupati, India
Checkpointing and message logging are few of the popular and general-purpose methods for providing fault tolerance in distributed systems. Several variations of their basic schemes have been reported in the literature. Majority of the coordinated checkpointing algorithms have not addressed about the treatment of lost messages. And also the schemes that consider the improvement of several or all performance factors are very rare. We addressed these issues by developing a new and efficient coordinated checkpointing protocol combined with limited sender-based pessimistic message logging. The significant contribution given by our scheme is that it never creates lost messages. The term limited message logging impli- s that ours is a periodic checkpointing strategy where the checkpoints and logging of messages takes place only within a specified interval (called, critical interval CI). Hence it minimizes checkpoint overhead, rollback distance, message logging and even recovery overheads. Output commit latency is also reduced to a considerable extent. Further, while logging the messages, the processes need not be blocked in this scheme. Performance measurement results obtained from our simulations indicate that the proposed strategy outperforms the existing standard techniques - independent checkpointing, pure sender based pessimistic message logging, and optimistic message logging. Another merit of our protocol is that, it is hardware independent and hence it can be implemented in multi-computer systems irrespective of the architecture, interconnection and routing strategy
Keywords :
checkpointing; fault tolerant computing; message passing; protocols; software performance evaluation; coordinated checkpointing protocol; critical interval; fault tolerant distributed system; limited sender-based pessimistic message logging; performance evaluation; Algorithm design and analysis; Checkpointing; Computer science; Fault tolerant systems; Notice of Violation; Performance analysis; Protocols; Critical interval; lost messages etc.; output commit;