• DocumentCode
    1991685
  • Title

    Analysis and experimental evaluation of comparison-based system-level diagnosis for multiprocessor systems

  • Author

    Hongying Wang ; Blough, D.M. ; Alkalaj, L.

  • Author_Institution
    Dept. of Electr. & Comput. Eng., California Univ., Irvine, CA, USA
  • fYear
    1994
  • fDate
    15-17 June 1994
  • Firstpage
    55
  • Lastpage
    64
  • Abstract
    A comparison-based model for system-level fault diagnosis that generalizes both the classical PMC model and the Maeng/Malek comparison model is studied. A new necessary and sufficient condition for a system to be t-diagnosable under this model is proven. Also, a class of systems that uses the minimum number of communication links to obtain a given degree of diagnosability is presented. Next, a distributed diagnosis algorithm is presented that can reduce the number of tests necessary for diagnosis when the number of faults is relatively small. To demonstrate the practicality of our diagnosis approach, a fault table based diagnosis algorithm suitable for relatively small systems has been implemented in the Common Spaceborne Multicomputer Operating System (COSMOS). A simulator for the JPL MAX multicomputer system running COSMOS was used to test the algorithm and evaluate its performance. The results show that the algorithm diagnoses all fault situations with low latency and very little overhead.<>
  • Keywords
    distributed algorithms; failure analysis; multiprocessing programs; multiprocessing systems; operating systems (computers); parallel processing; program diagnostics; COSMOS; Common Spaceborne Multicomputer Operating System; JPL MAX multicomputer system; Maeng/Malek comparison model; classical PMC model; communication links; comparison-based model; comparison-based system-level diagnosis; diagnosability; multiprocessor systems; system-level fault diagnosis; Delay; Fault diagnosis; Fault tolerant systems; Laboratories; Military computing; Multiprocessing systems; Propulsion; Redundancy; Space technology; Testing;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Fault-Tolerant Computing, 1994. FTCS-24. Digest of Papers., Twenty-Fourth International Symposium on
  • Conference_Location
    Austin, TX, USA
  • Print_ISBN
    0-8186-5520-8
  • Type

    conf

  • DOI
    10.1109/FTCS.1994.315657
  • Filename
    315657