• DocumentCode
    909349
  • Title

    Fault tolerance in multiprocessor systems without dedicated redundancy

  • Author

    Agrawal, Prathima

  • Author_Institution
    AT&T Bell Labs., Murray Hill, NJ, USA
  • Volume
    37
  • Issue
    3
  • fYear
    1988
  • fDate
    3/1/1988 12:00:00 AM
  • Firstpage
    358
  • Lastpage
    362
  • Abstract
    An algorithm called RAFT (recursive algorithm for fault tolerance) for achieving fault tolerance in multiprocessor systems is described. Through the use of a combination of dynamic space- and time- redundancy techniques, RAFT achieves fault tolerance in the presence of permanent as well as intermittent faults. Performance and reliability of multiprocessor systems using RAFT are determined as a function of individual processor reliability and the total number of fault modes in a processor. RAFT-based systems are superior to triple modular redundancy (TMR) systems in hardware economy and provide comparable reliability. A multiprocessor architecture adopting RAFT is given
  • Keywords
    fault tolerant computing; multiprocessing systems; RAFT; dynamic space redundancy; fault tolerance; multiprocessor systems; recursive algorithm for fault tolerance; time- redundancy techniques; triple modular redundancy; Checkpointing; Computer architecture; Fault detection; Fault diagnosis; Fault tolerance; Fault tolerant systems; Hardware; Multiprocessing systems; Nuclear magnetic resonance; Redundancy;
  • fLanguage
    English
  • Journal_Title
    Computers, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    0018-9340
  • Type

    jour

  • DOI
    10.1109/12.2174
  • Filename
    2174