• DocumentCode
    1216872
  • Title

    Fail-safeness in a multiprocessor system. a distributed strategy based on backward error recovery

  • Author

    Corsini, P. ; Lopriore, L. ; Strigini, L.

  • Author_Institution
    Universit¿¿ di Pisa, Istituto di Elettronica e Telecomunicazioni, Pisa, Italy
  • Volume
    2
  • Issue
    6
  • fYear
    1983
  • fDate
    12/1/1983 12:00:00 AM
  • Firstpage
    147
  • Lastpage
    156
  • Abstract
    A method for fault handling is presented, designed for multiprocessor systems supporting concurrent processes co-operating through message exchange. The proposal is described in reference to a specific system, i.e. the MuTEAM prototype developed in Pisa. The requirement was that no erroneous output should be generated by the system under a single-fault hypothesis. The fault-handling model adopted is based on backward error recovery. The set of all the application processes is partitioned into disjoint subsets (called families), which represent the atomic unit of recovery. Recovery points are established on communications among families. A single consistent recovery line is maintained, thereby avoiding the domino effect. The model does not rely on the use of mass storage devices; rather, the recovery information pertinent to all the processes is kept in the distributed main memory of the system.
  • Keywords
    fault tolerant computing; multiprocessing systems; parallel processing; system recovery; MuTEAM prototype; backward error recovery; concurrent processes; fault handling; message exchange; multiprocessor systems; single-fault hypothesis;
  • fLanguage
    English
  • Journal_Title
    Software & Microsystems
  • Publisher
    iet
  • ISSN
    0261-3182
  • Type

    jour

  • DOI
    10.1049/sm.1983.0054
  • Filename
    4807973