• DocumentCode
    400780
  • Title

    Dynamic fault-tolerance and metrics for battery powered, failure-prone systems

  • Author

    Stanley-Marbell, P. ; Marculescu, Diana

  • Author_Institution
    Dept. of Electr. & Comput. Eng., Carnegie Mellon Univ., Pittsburgh, PA, USA
  • fYear
    2003
  • fDate
    9-13 Nov. 2003
  • Firstpage
    633
  • Lastpage
    640
  • Abstract
    Emerging VLSI technologies and platforms are giving rise to systems with inherently high potential for runtime failure. Such failures range from intermittent electrical and mechanical failures at the system level, to device failures at the chip level. Techniques to provide reliable computation in the presence of failures must do so while maintaining high performance, with an eye toward energy efficiency. When possible, they should maximize battery lifetime in the face of battery discharge non-linearities. This paper introduces the concept of adaptive fault-tolerance management for failure-prone systems, and a classification of local algorithms for achieving system-wide reliability. In order to judge the efficacy of the proposed algorithms for dynamic fault-tolerance management, a set of metrics, for characterizing system behavior in terms of energy efficiency, reliability, computation performance and battery lifetime, is presented. For an example platform employed in a realistic evaluation scenario, it is shown that system configurations with the best performance and lifetime are not necessarily those with the best combination of performance, reliability, battery lifetime and average power consumption.
  • Keywords
    VLSI; battery management systems; fault tolerance; power consumption; reliability; VLSI technology; adaptive fault tolerance management; battery discharge nonlinearities; battery lifetime; battery powered failure-prone systems; device failures; dynamic fault tolerance; intermittent electrical failure; intermittent mechanical failure; metrics; power consumption; reliability; runtime failure; very large scale integration; Batteries; Classification algorithms; Energy efficiency; Fault tolerant systems; Heuristic algorithms; High performance computing; Maintenance; Power system reliability; Runtime; Very large scale integration;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Computer Aided Design, 2003. ICCAD-2003. International Conference on
  • Conference_Location
    San Jose, CA, USA
  • Print_ISBN
    1-58113-762-1
  • Type

    conf

  • DOI
    10.1109/ICCAD.2003.159747
  • Filename
    1257877