Title :
An Adaptive Checkpointing Scheme for Peer-to-Peer Based Volunteer Computing Work Flows
Author :
Ni, Lei ; Harwood, Aaron
Author_Institution :
Dept. of Comput. Sci. & Software Eng., Univ. of Melbourne, Melbourne, VIC
Abstract :
Volunteer computing, sometimes called public resource computing, is an emerging computational model that is very suitable for work-pooled parallel processing. As more complex grid applications make use of work flows in their design and deployment it is reasonable to consider the impact of work flow deployment over a volunteer computing infrastructure. In this case, the inter work flow I/O can lead to a significant increase in I/O demands at the work pool server. A possible solution is the use of a peer-to-peer based parallel computing architecture to off-load this I/O demand to the workers; where the workers can fulfill some aspects of work flow coordination and I/O checking, etc. However, achieving robustness in such a large scale system is a challenging hurdle towards the decentralized execution of work flows and general parallel processes. To increase robustness, we propose and show the merits of using an adaptive checkpoint scheme that efficiently checkpoints the status of the parallel processes according to the estimation of relevant network and peer parameters. Based on our proposed mathematical checkpoint model, our scheme uses statistical data observed during runtime to dynamically make checkpoint decisions in a completely decentralized manner. The results of simulation show support for our proposed approach in terms of reduced required runtime.
Keywords :
checkpointing; grid computing; parallel processing; peer-to-peer computing; software architecture; adaptive checkpointing; grid applications; parallel computing architecture; parallel processing; peer-to-peer computing; public resource computing; volunteer computing work flows; work pool server; Checkpointing; Computational modeling; Computer architecture; Concurrent computing; Grid computing; Large-scale systems; Parallel processing; Peer to peer computing; Robustness; Runtime;
Conference_Titel :
Parallel and Distributed Computing, Applications and Technologies, 2008. PDCAT 2008. Ninth International Conference on
Conference_Location :
Otago
Print_ISBN :
978-0-7695-3443-5
DOI :
10.1109/PDCAT.2008.53