Title :
Accurate and inexpensive performance monitoring for variability-aware systems
Author :
Liangzhen Lai ; Gupta, Puneet
Author_Institution :
Dept. of Electr. Eng., Univ. of California Los Angeles, Los Angeles, CA, USA
Abstract :
Designing reliable integrated systems has become a major challenge with shrinking geometries, increasing fault rates and devices which age substantially in their usage life. The proposed research is motivated by the observation that many of the in-field failures are delay failures and several variability signatures are also delay-related. The origins of temporal delay fluctuations include manufacturing variability, voltage/temperature changes, negative or positive bias temperature instability-related Vth degradation, etc. Since the actual delay changes depend on process variations as well as workload, on-chip monitoring may be the best way of predicting them. There is a need to monitor circuit performance during manufacturing as well as at runtime to predict achievable performance and warn against impending failures. Adaptive mechanisms in hardware and/or software can optimize the trade-off between errors, energy and performance based on the feedback from runtime circuit performance monitors. This paper presents approaches for automated synthesis of design-dependent performance monitors. These monitors can be used to predict impending delay failures relatively inexpensively. For low-overhead monitoring, we propose multiple design-dependent ring oscillators (DDROs) as smart canary structures which can reliably predict achievable chip frequency but with margins for local variations. Early silicon results indicate that DDROs can reduce delay monitoring error by 35% compared to conventional ring oscillators. To further improve the prediction (albeit at a higher overhead), we propose in-situ slack monitors (SlackProbe) which can match local variations as well at overheads much smaller than monitoring all sequential elements. SlackProbe reduces the number of monitors required by over 15X with 5% additional delay margin in several commercial processor benchmarks. Finally, we show an example of software testbed that demonstrates a variability-aware system that utilizes the- hardware monitors and operates with both hardware and software adaptation.
Keywords :
fault tolerant computing; logic design; oscillators; DDRO; SlackProbe; adaptive mechanisms; bias temperature instability; circuit performance monitoring; delay failures; delay margin; design-dependent performance monitors; design-dependent ring oscillators; hardware adaptation; in-field failures; integrated systems; manufacturing variability; on-chip monitoring; processor benchmarks; runtime circuit performance monitors; smart canary structures; software adaptation; temporal delay fluctuations; usage life; variability signatures; variability-aware system; variability-aware systems; voltage-temperature changes; Delays; Hardware; Logic gates; Monitoring; Sensitivity; Software;
Conference_Titel :
Design Automation Conference (ASP-DAC), 2014 19th Asia and South Pacific
Conference_Location :
Singapore
DOI :
10.1109/ASPDAC.2014.6742935