• DocumentCode
    3130032
  • Title

    The case for lifetime reliability-aware microprocessors

  • Author

    Srinivasan, Jayanth ; Adve, Sarita V. ; Bose, Pradip ; Rivers, Jude A.

  • Author_Institution
    Dept. of Comput. Sci., Univ. of Illinois at Urbana-Champaign, Urbana, IL, USA
  • fYear
    2004
  • fDate
    19-23 June 2004
  • Firstpage
    276
  • Lastpage
    287
  • Abstract
    Ensuring long processor lifetimes by limiting failures due to wear-out related hard errors is a critical requirement for all microprocessor manufacturers. We observe that continuous device scaling and increasing temperatures are making lifetime reliability targets even harder to meet. However, current methodologies for qualifying lifetime reliability are overly conservative since they assume worst-case operating conditions. This paper makes the case that the continued use of such methodologies will significantly and unnecessarily constrain performance. Instead, lifetime reliability awareness at the microarchitectural design stage can mitigate this problem, by designing processors that dynamically adapt in response to the observed usage to meet a reliability target. We make two specific contributions. First, we describe an architecture-level model and its implementation, called RAMP, that can dynamically track lifetime reliability, responding to changes in application behavior. RAMP is based on state-of-the-art device models for different wear-out mechanisms. Second, we propose dynamic reliability management (DRM) - a technique where the processor can respond to changing application behavior to maintain its lifetime reliability target. In contrast to current worst-case behavior based reliability qualification methodologies, DRM allows processors to be qualified for reliability at lower (but more likely) operating points than the worst case. Using RAMP, we show that this can save cost and/or improve performance, that dynamic voltage scaling is an effective response technique for DRM, and that dynamic thermal management neither subsumes nor is subsumed by DRM.
  • Keywords
    computer architecture; integrated circuit modelling; integrated circuit reliability; microprocessor chips; reliability; DRM; RAMP; architecture-level model; device scaling; dynamic reliability management; dynamic thermal management; lifetime reliability-aware microprocessors; microarchitectural design; microprocessor manufacturing; processor design; processor lifetimes; reliability qualification; wear-out related hard errors; worst-case operating conditions; Costs; Maintenance; Manufacturing processes; Microarchitecture; Microprocessors; Process design; Qualifications; Target tracking; Temperature; Thermal management;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Computer Architecture, 2004. Proceedings. 31st Annual International Symposium on
  • ISSN
    1063-6897
  • Print_ISBN
    0-7695-2143-6
  • Type

    conf

  • DOI
    10.1109/ISCA.2004.1310781
  • Filename
    1310781