Author :
Li, Yanjing ; Mutlu, Onur ; Gardner, Donald S. ; Mitra, Subhasish
Author_Institution :
Stanford Univ., Stanford, CA, USA
Abstract :
Concurrent autonomous self-test, or online self-test, allows a system to test itself, concurrently during normal operation, with no system downtime visible to the end-user. Online self-test is important for overcoming major reliability challenges such as early-life failures and circuit aging in future System-on-Chips (SoCs). To ensure required levels of overall reliability of SoCs, it is essential to apply online self-test to uncore components, e.g., cache controllers, DRAM controllers, and I/O controllers, in addition to processor cores. This is because uncore components can account for a significant portion of the overall logic area of a multi-core SoC. In this paper, we present an efficient online self-test technique for uncore components in SoCs. We achieve extremely high test coverage by storing high-quality test patterns in off-chip non-volatile storage. However, a simple technique that stalls the uncore-component-under-test can result in significant system performance degradation or even visible system unresponsiveness. Our new techniques overcome these challenges and enable cost-effective online self-test of uncore components through three special hardware features: 1. resource reallocation and sharing (RRS); 2. no-performance-impact testing; and, 3. smart backups. Implementation of online self-test for uncore components of the open-source OpenSPARC T2 multi-core SoC, using a combination of these three techniques, achieves high test coverage at < 1% area impact, < 1% power impact, and < 3% system-level performance impact. These results demonstrate the effectiveness and practicality of our techniques.
Keywords :
ageing; integrated circuit reliability; integrated circuit testing; system-on-chip; DRAM controllers; I/O controllers; cache controllers; circuit aging; concurrent autonomous self-test; early-life failure; high-quality test patterns; no-performance-impact testing; off-chip nonvolatile storage; online self-test; open-source OpenSPARC T2 multicore SoC; processor cores; reliability; resource reallocation; resource sharing; smart backups; system-on-chips; uncore components; Aging; Automatic testing; Built-in self-test; Circuit testing; Degradation; Logic; Random access memory; System performance; System testing; System-on-a-chip;