• DocumentCode
    2897432
  • Title

    Trace-based microarchitecture-level diagnosis of permanent hardware faults

  • Author

    Li, Man-Lap ; Ramachandran, Pradeep ; Sahoo, Swamp K. ; Adve, Sarita V. ; Adve, V.S. ; Zhou, Yuanyuan

  • Author_Institution
    Dept. of Comput. Sci., Illinois Univ., Champaign, IL
  • fYear
    2008
  • fDate
    24-27 June 2008
  • Firstpage
    22
  • Lastpage
    31
  • Abstract
    As devices continue to scale, future shipped hardware will likely fail due to in-the-field hardware faults. As traditional redundancy-based hardware reliability solutions that tackle these faults will be too expensive to be broadly deployable, recent research has focused on low-overhead reliability solutions. One approach is to employ low-overhead (ldquoalways-onrdquo) detection techniques that catch high-level symptoms and pay a higher overhead for (rarely invoked) diagnosis. This paper presents trace-based fault diagnosis, a diagnosis strategy that identifies permanent faults in microarchitectural units by analyzing the faulty corepsilas instruction trace. Once a fault is detected, the faulty core is rolled back and re-executes from a previous checkpoint, generating a faulty instruction trace and recording the microarchitecture-level resource usage. A diagnosis process on another fault-free core then generates a fault-free trace which it compares with the faulty trace to identify the faulty unit. Our result shows that this approach successfully diagnoses 98% of the faults studied and is a highly robust and flexible way for diagnosing permanent faults.
  • Keywords
    computer architecture; fault diagnosis; fault tolerance; instruction sets; logic design; logic testing; microprocessor chips; checkpointing; instruction trace-based microarchitecture-level fault diagnosis; microarchitecture-level resource usage; permanent hardware fault; processor-level redundancy-based hardware reliability solution; Circuit faults; Computer science; Fault detection; Fault diagnosis; Hardware; Microarchitecture; Monitoring; Moore´s Law; Pervasive computing; Robustness;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Dependable Systems and Networks With FTCS and DCC, 2008. DSN 2008. IEEE International Conference on
  • Conference_Location
    Anchorage, AK
  • Print_ISBN
    978-1-4244-2397-2
  • Electronic_ISBN
    978-1-4244-2398-9
  • Type

    conf

  • DOI
    10.1109/DSN.2008.4630067
  • Filename
    4630067