• DocumentCode
    2453652
  • Title

    Leveraging 3D Technology for Improved Reliability

  • Author

    Madan, Niti ; Balasubramonian, Rajeev

  • Author_Institution
    Univ. of Utah, Salt Lake City
  • fYear
    2007
  • fDate
    1-5 Dec. 2007
  • Firstpage
    223
  • Lastpage
    235
  • Abstract
    Aggressive technology scaling over the years has helped improve processor performance but has caused a reduction in processor reliability. Shrinking transistor sizes and lower supply voltages have increased the vulnerability of computer systems towards transient faults. An increase in within-die and die-to-die parameter variations has also led to a greater number of dynamic timing errors. A potential solution to mitigate the impact of such errors is redundancy via an in-order checker processor. Emerging 3D chip technology promises increased processor performance as well as reduced power consumption because of shorter on-chip wires. In this paper, we leverage the "snap-on" functionality provided by 3D integration and propose implementing the redundant checker processor on a second die. This allows manufacturers to easily create a family of "reliable processors" without significantly impacting the cost or performance for customers that care less about reliability. We comprehensively evaluate design choices for this second die, including the effects of L2 cache organization, deep pipelining, and frequency. An interesting feature made possible by 3D integration is the incorporation of heterogeneous process technologies within a single chip. We evaluate the possibility of providing redundancy with an older process technology, an unexplored and especially compelling application of die heterogeneity. We show that with the most pessimistic assumptions, the overhead of the second die can be as high as either a 7degC temperature increase or a 8% performance loss. However, with the use of an older process, this overhead can be reduced to a 3degC temperature increase or a 4% performance loss, while also providing higher error resilience.
  • Keywords
    circuit reliability; pipeline processing; system-on-chip; 3D die stacking; 3D integration; L2 cache organization; deep pipelining; heterogeneous process technology; on-chip tempearture; processor reliability; redundant checker processor; single chip; snap-on functionality; Computer errors; Dynamic voltage scaling; Energy consumption; Manufacturing processes; Performance loss; Redundancy; Temperature; Timing; Transistors; Wires;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Microarchitecture, 2007. MICRO 2007. 40th Annual IEEE/ACM International Symposium on
  • Conference_Location
    Chicago, IL
  • ISSN
    1072-4451
  • Print_ISBN
    978-0-7695-3047-5
  • Electronic_ISBN
    1072-4451
  • Type

    conf

  • DOI
    10.1109/MICRO.2007.31
  • Filename
    4408258