• DocumentCode
    1399208
  • Title

    Fast and Message-Efficient Global Snapshot Algorithms for Large-Scale Distributed Systems

  • Author

    Kshemkalyani, Ajay D.

  • Author_Institution
    Comput. Sci. Dept., Univ. of Illinois at Chicago, Chicago, IL, USA
  • Volume
    21
  • Issue
    9
  • fYear
    2010
  • Firstpage
    1281
  • Lastpage
    1289
  • Abstract
    Large-scale distributed systems such as supercomputers and peer-to-peer systems typically have a fully connected logical topology over a large number of processors. Existing snapshot algorithms in such systems have high response time and/or require a large number of messages, typically O(n2), where n is the number of processes. In this paper, we present a suite of two algorithms: simple_tree, and hypercube, that are both fast and require a small number of messages. This makes the algorithms highly scalable. Simple_tree requires O(n) messages and has O(log n) response time. Hypercube requires O(n log n) messages and has O(log n) response time, in addition to having the property that the roles of all the processes are symmetrical. Process symmetry implies greater potential for balanced workload and congestion-freedom. All the algorithms assume non-FIFO channels.
  • Keywords
    communication complexity; hypercube networks; large-scale systems; message passing; parallel processing; trees (mathematics); hypercube; large scale distributed systems; message efficient global snapshot algorithms; nonFIFO channels; simple tree algorithms; Distributed system; checkpoint; cluster; distributed snapshot; global state; hypercube; message passing; overlay.; supercomputer;
  • fLanguage
    English
  • Journal_Title
    Parallel and Distributed Systems, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    1045-9219
  • Type

    jour

  • DOI
    10.1109/TPDS.2010.24
  • Filename
    5401156