DocumentCode :
2453426
Title :
Software-Based Online Detection of Hardware Defects Mechanisms, Architectural Support, and Evaluation
Author :
Constantinides, Kypros ; Mutlu, Onur ; Austin, Todd ; Bertacco, Valeria
Author_Institution :
Univ. of Michigan, Ann Arbor
fYear :
2007
fDate :
1-5 Dec. 2007
Firstpage :
97
Lastpage :
108
Abstract :
As silicon process technology scales deeper into the nanometer regime, hardware defects are becoming more common. Such defects are bound to hinder the correct operation of future processor systems, unless new online techniques become available to detect and to tolerate them while preserving the integrity of software applications running on the system. This paper proposes a new, software-based, defect detection and diagnosis technique. We introduce a novel set of instructions, called access-control extension (ACE), that can access and control the microprocessor´s internal state. Special firmware periodically suspends microprocessor execution and uses the ACE instructions to run directed tests on the hardware. When a hardware defect is present, these tests can diagnose and locate it, and then activate system repair through resource reconfiguration. The software nature of our framework makes it flexible: testing techniques can be modified/upgraded in the field to trade off performance with reliability without requiring any change to the hardware. We evaluated our technique on a commercial chip-multiprocessor based on Sun´s Niagara and found that it can provide very high coverage, with 99.22% of all silicon defects detected. Moreover, our results show that the average performance overhead of software-based testing is only 5.5%. Based on a detailed RTL-level implementation of our technique, we find its area overhead to be quite modest, with only a 5.8% increase in total chip area.
Keywords :
crystal defects; elemental semiconductors; fault location; fault tolerant computing; firmware; instruction sets; integrated circuit reliability; integrated circuit testing; microprocessor chips; performance evaluation; silicon; software reliability; access-control extension; firmware; hardware defects; microprocessor execution; silicon process technology; software integrity; software-based diagnosis technique; software-based online defect detection; software-based testing; system repair; Checkpointing; Circuit testing; Computer architecture; Degradation; Electronics industry; Fault detection; Hardware; Silicon; Software testing; System recovery;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Microarchitecture, 2007. MICRO 2007. 40th Annual IEEE/ACM International Symposium on
Conference_Location :
Chicago, IL
ISSN :
1072-4451
Print_ISBN :
978-0-7695-3047-5
Electronic_ISBN :
1072-4451
Type :
conf
DOI :
10.1109/MICRO.2007.34
Filename :
4408248
Link To Document :
بازگشت