• DocumentCode
    2249690
  • Title

    Fault tolerance for parallel applications through replication

  • Author

    Hong Shum, Kam

  • Author_Institution
    Dept. of Inf. Syst. & Comput. Sci., Nat. Univ. of Singapore, Singapore
  • Volume
    3
  • fYear
    1997
  • fDate
    9-12 Sep 1997
  • Firstpage
    1462
  • Abstract
    Based on the technique of replication, an efficient fault-tolerant model for parallel computing on workstation clusters is proposed. The model is built on top of a runtime system which supports resource allocation for parallel applications running on heterogeneous workstation clusters. According to the results of resource allocation, replicated parallel applications can minimize their resource consumption by runtime reconfiguration. Besides, checkpointed states only transfer among replicated applications, no expensive disk read/write operations are therefore required
  • Keywords
    computer networks; fault tolerant computing; performance evaluation; resource allocation; fault tolerance; fault-tolerant model; heterogeneous workstation clusters; parallel applications; replication; resource allocation; runtime system; workstation clusters; Application software; Computer science; Fault tolerance; Fault tolerant systems; Information systems; Interference; Parallel processing; Resource management; Runtime; Workstations;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Information, Communications and Signal Processing, 1997. ICICS., Proceedings of 1997 International Conference on
  • Print_ISBN
    0-7803-3676-3
  • Type

    conf

  • DOI
    10.1109/ICICS.1997.652234
  • Filename
    652234