• DocumentCode
    2256276
  • Title

    An integrated error-recovery scheme for multicomputers

  • Author

    DeBrunner, Linda S. ; Ghali, Prasanna

  • Author_Institution
    Dept. of Electr. Eng., Oklahoma Univ., Norman, OK, USA
  • fYear
    1993
  • fDate
    1-3 Nov 1993
  • Firstpage
    817
  • Abstract
    The paper examines the problem of recovering from processing node failures in multicomputers. A general-purpose, application-transparent integrated error-recovery scheme is presented to recover the multicomputer from processing node failures in the absence of concurrent fault detection and diagnosis facilities. In the scheme, a distributed system-level fault diagnosis algorithm and error recovery algorithms cooperate to obtain a set of consistent and error-free checkpoints. The integration of fault diagnosis and error recovery algorithms permits the implementation of an effective and comprehensive fault tolerance scheme for a wide variety of distributed systems and multicomputer networks
  • Keywords
    computer network reliability; fault diagnosis; fault tolerant computing; multiprocessing systems; system recovery; application-transparent integrated error-recovery scheme; concurrent fault detection; distributed systems; error recovery algorithms; error-free checkpoints; fault tolerance scheme; integrated error-recovery scheme; multicomputer networks; multicomputers; processing node failures; Circuit faults; Clocks; Fault detection; Fault diagnosis; Fault tolerance; Fault tolerant systems; Routing protocols; Switching circuits; System recovery; Testing;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Signals, Systems and Computers, 1993. 1993 Conference Record of The Twenty-Seventh Asilomar Conference on
  • Conference_Location
    Pacific Grove, CA
  • ISSN
    1058-6393
  • Print_ISBN
    0-8186-4120-7
  • Type

    conf

  • DOI
    10.1109/ACSSC.1993.342635
  • Filename
    342635