• DocumentCode
    2165725
  • Title

    A new method for transparent fault tolerance of distributed programs on a network of workstations using alternative schedules

  • Author

    Das, Dibyendu ; Dasgupta, Pallab ; Das, P.P.

  • Author_Institution
    Dept. of Comput. Sci. & Eng., Indian Inst. of Technol., Kharagpur, India
  • fYear
    1997
  • fDate
    10-12 Dec 1997
  • Firstpage
    479
  • Lastpage
    486
  • Abstract
    In this paper, we devise a new method for transparent fault tolerance of distributed programs running on a cluster of networked workstations. We use the concept of alternative schedules for this purpose. Such schedules are generated from static task graphs at compile-time. At run-time a distributed program can use these alternatives to switch from one schedule to another if some machine/s become faulty. We have devised fast but efficient mechanisms for switching among schedules at run-time. This enables fault recovery from any number of simultaneous machine faults any number of times. The correctness of the resultant algorithm is ensured through prevention of direct data sharing among local tasks on a machine. Such a transparent fault tolerant strategy is easily implementable on a network of workstations running PVM-like softwares
  • Keywords
    fault tolerant computing; local area networks; parallel programming; processor scheduling; program verification; PVM-like softwares; algorithm correctness; direct data sharing; distributed programs; fault recovery; network of workstations; schedules; static task graphs; transparent fault tolerance; Computer crashes; Computer science; Fault tolerance; Fluctuations; Intelligent networks; Parallel processing; Processor scheduling; Runtime; Switches; Workstations;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Algorithms and Architectures for Parallel Processing, 1997. ICAPP 97., 1997 3rd International Conference on
  • Conference_Location
    Melbourne, Vic.
  • Print_ISBN
    0-7803-4229-1
  • Type

    conf

  • DOI
    10.1109/ICAPP.1997.651515
  • Filename
    651515