• DocumentCode
    1854019
  • Title

    An Adaptive Checkpointing Scheme for Peer-to-Peer Based Volunteer Computing Work Flows

  • Author

    Ni, Lei ; Harwood, Aaron

  • Author_Institution
    Dept. of Comput. Sci. & Software Eng., Univ. of Melbourne, Melbourne, VIC
  • fYear
    2008
  • fDate
    1-4 Dec. 2008
  • Firstpage
    227
  • Lastpage
    234
  • Abstract
    Volunteer computing, sometimes called public resource computing, is an emerging computational model that is very suitable for work-pooled parallel processing. As more complex grid applications make use of work flows in their design and deployment it is reasonable to consider the impact of work flow deployment over a volunteer computing infrastructure. In this case, the inter work flow I/O can lead to a significant increase in I/O demands at the work pool server. A possible solution is the use of a peer-to-peer based parallel computing architecture to off-load this I/O demand to the workers; where the workers can fulfill some aspects of work flow coordination and I/O checking, etc. However, achieving robustness in such a large scale system is a challenging hurdle towards the decentralized execution of work flows and general parallel processes. To increase robustness, we propose and show the merits of using an adaptive checkpoint scheme that efficiently checkpoints the status of the parallel processes according to the estimation of relevant network and peer parameters. Based on our proposed mathematical checkpoint model, our scheme uses statistical data observed during runtime to dynamically make checkpoint decisions in a completely decentralized manner. The results of simulation show support for our proposed approach in terms of reduced required runtime.
  • Keywords
    checkpointing; grid computing; parallel processing; peer-to-peer computing; software architecture; adaptive checkpointing; grid applications; parallel computing architecture; parallel processing; peer-to-peer computing; public resource computing; volunteer computing work flows; work pool server; Checkpointing; Computational modeling; Computer architecture; Concurrent computing; Grid computing; Large-scale systems; Parallel processing; Peer to peer computing; Robustness; Runtime;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Parallel and Distributed Computing, Applications and Technologies, 2008. PDCAT 2008. Ninth International Conference on
  • Conference_Location
    Otago
  • Print_ISBN
    978-0-7695-3443-5
  • Type

    conf

  • DOI
    10.1109/PDCAT.2008.53
  • Filename
    4710985