• DocumentCode
    1925481
  • Title

    A symmetric O(n log n) message distributed snapshot algorithm for large-scale systems

  • Author

    Kshemkalyani, Ajay D.

  • Author_Institution
    Comput. Sci. Dept., Univ. of Illinois at Chicago, Chicago, IL, USA
  • fYear
    2009
  • fDate
    Aug. 31 2009-Sept. 4 2009
  • Firstpage
    1
  • Lastpage
    4
  • Abstract
    This paper presents a O(n log n) message distributed snapshot algorithm for a system with non-FIFO channels, where n is the number of processors. The algorithm finds applications for checkpointing in large scale supercomputers and distributed systems that have a fully connected logical topology over a large number of processors. Each processor sends log n messages in the algorithm. The sizes of the messages are geometrically distributed, and the sum of the sizes of the messages sent by any processor is n. The response time of the algorithm is O(log n). The algorithm is fully distributed and the role of each processor is symmetric, unlike tree-based, ring-based, and centralized algorithms.
  • Keywords
    checkpointing; computational complexity; distributed processing; large-scale systems; mainframes; checkpointing; large scale supercomputers; large-scale systems; message distributed snapshot; non-FIFO channels; Application software; Checkpointing; Computer science; Costs; Delay; Hypercubes; Large-scale systems; Supercomputers; Topology; Tree graphs;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Cluster Computing and Workshops, 2009. CLUSTER '09. IEEE International Conference on
  • Conference_Location
    New Orleans, LA
  • ISSN
    1552-5244
  • Print_ISBN
    978-1-4244-5011-4
  • Electronic_ISBN
    1552-5244
  • Type

    conf

  • DOI
    10.1109/CLUSTR.2009.5289139
  • Filename
    5289139