• DocumentCode
    26377
  • Title

    A Low-Cost Mechanism Exploiting Narrow-Width Values for Tolerating Hard Faults in ALU

  • Author

    Seokin Hong ; Soontae Kim

  • Author_Institution
    Dept. of Comput. Sci., Korea Adv. Inst. of Sci. & Technol., Daejeon, South Korea
  • Volume
    64
  • Issue
    9
  • fYear
    2015
  • fDate
    Sept. 1 2015
  • Firstpage
    2433
  • Lastpage
    2446
  • Abstract
    Digital circuits are expected to increasingly suffer from more hard faults due to technology scaling. Especially, a single hard fault in ALU (Arithmetic Logic Unit) might lead to a total failure in processors or significantly reduce their performance. To address these increasingly important problems, we propose a novel cost-efficient fault-tolerant mechanism for the ALU, called LIZARD. LIZARD employs two half-word ALUs, instead of a single full-word ALU, to perform computations with concurrent fault detection. When a fault is detected, the two ALUs are partitioned into four quarter-word ALUs. After diagnosing and isolating a faulty quarter-word ALU, LIZARD continues its operation using the remaining ones, which can detect and isolate another fault. Even though LIZARD uses narrow ALUs for computations, it adds negligible performance overhead through exploiting predictability of the results in the arithmetic computations. We also present the architectural modifications when employing LIZARD for scalar as well as superscalar processors. Through comparative evaluation, we demonstrate that LIZARD outperforms other competitive fault-tolerant mechanisms in terms of area, energy consumption, performance and reliability.
  • Keywords
    digital arithmetic; digital circuits; failure analysis; fault tolerance; logic circuits; ALU; LIZARD; arithmetic logic unit; digital circuits; failure; fault tolerant mechanism; hard faults; low-cost mechanism; narrow-width values; processors; Adders; Circuit faults; Fault detection; Fault diagnosis; Fault tolerance; Fault tolerant systems; Program processors; Arithmetic and logic units; Fault tolerance; Hard Faults; Hard faults; Processor architectures; arithmetic and logic units; fault tolerance; processor architectures;
  • fLanguage
    English
  • Journal_Title
    Computers, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    0018-9340
  • Type

    jour

  • DOI
    10.1109/TC.2014.2366743
  • Filename
    6945834