• DocumentCode
    66907
  • Title

    Demonstrating HW–SW Transient Error Mitigation on the Single-Chip Cloud Computer Data Plane

  • Author

    Rodopoulos, Dimitrios ; Papanikolaou, Antonis ; Catthoor, Francky ; Soudris, Dimitrios

  • Author_Institution
    Micro Lab., Nat. Tech. Univ. of Athens, Athens, Greece
  • Volume
    23
  • Issue
    3
  • fYear
    2015
  • fDate
    Mar-15
  • Firstpage
    507
  • Lastpage
    519
  • Abstract
    Transient errors are a major concern for the correct operation of low-level cache memories. Aggressive integration requires effective mitigation of such errors, without extreme overheads in power, timing, or silicon area. We demonstrate a hybrid (hardware-software) scheme that mitigates bit flips in data that reside in low-level caches. The methodology is shown to be applicable in streaming applications and we illustrate that with a video decoding case study on a state-of-the-art many-core chip. The single-chip cloud computer is an experimental processor created by Intel Labs. Dedicated on-chip memories are utilized to keep safe copies for key application data, thus allowing rollbacks upon error detection. The experimental results illustrate the tradeoff between application delay, consumed energy, and output fidelity as the injected errors are corrected. When output fidelity is considered as a hard constraint, application slack used for mitigation can be reclaimed with dynamic frequency scaling. Output fidelity is guaranteed regardless of the error injection intensity and the application´s timing constraints are respected up to a certain upper bound of error injection.
  • Keywords
    cache storage; cloud computing; error detection; microprocessor chips; video coding; HW-SW transient error mitigation; Intel Labs; application timing constraints; dynamic frequency scaling; error detection; hybrid hardware-software scheme; low-level cache memories; many-core chip; on-chip memories; single-chip cloud computer data plane; video decoding; Cache storage; Decoding; Error analysis; Transform coding; Transient analysis; Dynamic frequency scaling (DFS); Joint Photographic Experts Group (JPEG) format; Motion JPEG (MJPEG); single-chip cloud computer (SCC); transient errors; transient errors.;
  • fLanguage
    English
  • Journal_Title
    Very Large Scale Integration (VLSI) Systems, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    1063-8210
  • Type

    jour

  • DOI
    10.1109/TVLSI.2014.2309663
  • Filename
    6784042