• DocumentCode
    1297354
  • Title

    Analysis and randomized design of algorithm-based fault tolerant multiprocessor systems under an extended model

  • Author

    Yajnik, Shalini ; Jha, Niraj K.

  • Author_Institution
    Lucent Technols., AT&T Bell Labs., Murray Hill, NJ, USA
  • Volume
    8
  • Issue
    7
  • fYear
    1997
  • fDate
    7/1/1997 12:00:00 AM
  • Firstpage
    757
  • Lastpage
    768
  • Abstract
    Reliability of compute-intensive applications can be improved by introducing fault tolerance into the system. Algorithm based fault tolerance (ABFT) is a low-cost scheme which provides the required fault tolerance to the system through system level encoding. In this paper, we propose randomized construction techniques, under an extended model, for the design of ABFT systems with the required fault tolerance capability. The model considers failures in the processors performing the checking operations
  • Keywords
    encoding; fault tolerant computing; multiprocessing systems; randomised algorithms; algorithm-based fault tolerant multiprocessor systems; extended model; fault tolerance; randomized design; system level encoding; Algorithm design and analysis; Computer applications; Degradation; Design methodology; Encoding; Fault detection; Fault diagnosis; Fault location; Fault tolerant systems; Multiprocessing systems;
  • fLanguage
    English
  • Journal_Title
    Parallel and Distributed Systems, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    1045-9219
  • Type

    jour

  • DOI
    10.1109/71.598349
  • Filename
    598349