• DocumentCode
    1448212
  • Title

    Adaptive system-level diagnosis for hypercube multiprocessors

  • Author

    Feng, Chao ; Bhuyan, Laxmi N. ; Lombardi, Fabrizio

  • Author_Institution
    Land Mobile Product Center, Motorola Inc., Schaumburg, IL, USA
  • Volume
    45
  • Issue
    10
  • fYear
    1996
  • fDate
    10/1/1996 12:00:00 AM
  • Firstpage
    1157
  • Lastpage
    1170
  • Abstract
    System-level diagnosis is an important technique for fault detection and location in multiprocessor computing systems. Efficient diagnosis is highly desirable for sustaining the original system power. Moreover, effective diagnosis is particularly important for a multiprocessor system with high scalability but low connectivity. Most of the existing results are not applicable in practice because of the high diagnosis cost and limited diagnosability. Over-d fault diagnosis, where d is the diagnosability, has only been addressed using a probabilistic method in the literature. Aiming at these two issues, we propose a hierarchical adaptive system-level diagnosis approach for hypercube systems using a divide-and-conquer strategy. We first propose a conceptual algorithm HADA to formulate a rigorous analysis. Then we present its practical variant IHADA. In HADA and IHADA, the over-d fault problem is inherently tackled through a deterministic method. Three measures for diagnosis cost (diagnosis time, number of tests, and number of test links) are analyzed for the proposed algorithms. It is proved that the diagnosis cost required by our approach is lower than in previous diagnosis algorithms. It is shown that the diagnosis cost for the proposed algorithms depends on the number and location of faulty units in the system and the cost is extremely low when only a small number of faulty units exist. It is also shown that our algorithms are characterized by lower costs than a pessimistic diagnosis algorithm which trades lower diagnosis cost for a lower degree of accuracy. Experimental results on the nCUBE are provided
  • Keywords
    fault diagnosis; hypercube networks; multiprocessing systems; HADA; diagnosability; divide-and-conquer; fault diagnosis; hierarchical adaptive; hypercube multiprocessors; hypercube systems; multiprocessor computing systems; system-level diagnosis; Adaptive systems; Algorithm design and analysis; Costs; Fault detection; Fault diagnosis; Hypercubes; Multiprocessing systems; Scalability; Testing; Time measurement;
  • fLanguage
    English
  • Journal_Title
    Computers, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    0018-9340
  • Type

    jour

  • DOI
    10.1109/12.543709
  • Filename
    543709