Title :
Recent results in checkpointing and failure recovery in distributed systems and wireless networks
Author_Institution :
Dept. of Comput. Sci., Univ. of Kentucky Lexington, Lexington, KY, USA
Abstract :
Summary form only given. Distributed systems today are ubiquitous and enable many applications, including client-server systems, transaction processing, World Wide Web, and scientific computing, among many others. Distributed systems are not fault-tolerant and the vast computing potential of these systems is often hampered by their susceptibility to failures. Many techniques, like transactions, group communication, and rollback recovery, have been developed to add reliability and high availability to distributed systems. This talk deals with rollback recovery protocols which restore the system back to a consistent state after a failure. Fault tolerance is achieved by periodically saving the state of a process during the failure-free execution, and restarting from a saved state upon a failure to reduce the amount of lost work. The speaker will present his recent results in checkpointing and failure recovery in distributed systems and wireless networks. Specifically, he will present results in a classification of checkpointing algorithms, present a communication-induced checkpointing algorithm that prevents useless checkpoints by tracking and preventing potential Z-cycles, and present the concept of mutable checkpoints for efficient checkpointing in wireless networks. He will conclude the talk with some open problems.
Keywords :
checkpointing; distributed processing; software fault tolerance; Z-cycles; communication-induced checkpointing algorithm; distributed systems; failure recovery; fault tolerance; group communication; mutable checkpoints concept; rollback recovery; transactions; wireless networks; Availability; Checkpointing; Client-server systems; Distributed computing; Fault tolerant systems; Pervasive computing; Protocols; Scientific computing; Web sites; Wireless networks;
Conference_Titel :
Parallel & Distributed Processing, Workshops and Phd Forum (IPDPSW), 2010 IEEE International Symposium on
Conference_Location :
Atlanta, GA
Print_ISBN :
978-1-4244-6533-0
DOI :
10.1109/IPDPSW.2010.5470819