• DocumentCode
    244335
  • Title

    Hardware-Software Integrated Diagnosis for Intermittent Hardware Faults

  • Author

    Dadashi, Majid ; Rashid, Layali ; Pattabiraman, Karthik ; Gopalakrishnan, S.

  • Author_Institution
    Dept. of Electr. & Comput. Eng., Univ. of British Columbia (UBC), Vancouver, BC, Canada
  • fYear
    2014
  • fDate
    23-26 June 2014
  • Firstpage
    363
  • Lastpage
    374
  • Abstract
    Intermittent hardware faults are hard to diagnose as they occur non-deterministically at the same location. Hardware-only diagnosis techniques incur significant power and area overheads. On the other hand, software-only diagnosis techniques have low power and area overheads, but have limited visibility into many micro-architectural structures and hence cannot diagnose faults in them. To overcome these limitations, we propose a hardware-software integrated framework for diagnosing intermittent faults. The hardware part of our framework, called SCRIBE continuously records the resource usage information of every instruction in the processor, and exposes it to the software layer. SCRIBE incurs a performance overhead of 12% and power overhead of 9%, on average. The software part of our framework is called SIED and uses backtracking from the program´s crash dump to find the faulty micro-architectural resource. Our technique has an average accuracy of 84% in diagnosing the faulty resource, which in turn enables fine-grained deconfiguration with less than 2% performance loss after deconfiguration.
  • Keywords
    fault diagnosis; hardware-software codesign; program diagnostics; SCRIBE; SIED; fault diagnosis; faulty micro-architectural resource; fine-grained deconfiguration; hardware-only diagnosis techniques; hardware-software integrated diagnosis; hardware-software integrated framework; intermittent hardware faults; micro-architectural structures; Circuit faults; Fault diagnosis; Hardware; Multicore processing; Pipelines; Registers; Software; Backtracking; Dynamic Dependence Graphs; Hardware/Software Co-design; Intermittent Faults;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Dependable Systems and Networks (DSN), 2014 44th Annual IEEE/IFIP International Conference on
  • Conference_Location
    Atlanta, GA
  • Type

    conf

  • DOI
    10.1109/DSN.2014.1
  • Filename
    6903594