• DocumentCode
    1962265
  • Title

    Trading off power and fault-tolerance in real-time embedded systems

  • Author

    Panerati, Jacopo ; Beltrame, Giovanni

  • Author_Institution
    Dept. de Genie Inf. et Genie Logiciel, Ecole Polytech. de Montreal, Montreal, QC, Canada
  • fYear
    2015
  • fDate
    15-18 June 2015
  • Firstpage
    1
  • Lastpage
    8
  • Abstract
    Reliability and fault-tolerance are essential requirements of critical, autonomous computing systems. In this paper, we propose a methodology to quantify, and maximize, the reliability of computation in the presence of transient errors when considering the mapping of real-time tasks on an homogeneous multiprocessor system with voltage and frequency scaling capabilities. As the likelihood of transient errors due to radiation is environment- and component-specific, we use machine learning to estimate the actual fault-rate of the system. Furthermore, we leverage probability theory to define a trade-off between power consumption and fault-tolerance. If a processing element fails, our methodology is able to re-map the application, establishing whether the real-time requirements will still be met, and how reliable the new, impaired system will be. Results show that the proposed methodology is able to adjust mapping and operating frequencies in order to maintain a fixed level of reliability for different fault-rates.
  • Keywords
    embedded systems; estimation theory; fault tolerant computing; learning (artificial intelligence); multiprocessing systems; power aware computing; probability; actual fault-rate estimation; critical autonomous computing systems; fault-tolerance; frequency scaling capabilities; homogeneous multiprocessor system; machine learning; power consumption; probability theory; real-time embedded systems; transient errors; voltage scaling capabilities; Fault tolerance; Fault tolerant systems; Multiprocessing systems; Power demand; Real-time systems; Transient analysis;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Adaptive Hardware and Systems (AHS), 2015 NASA/ESA Conference on
  • Conference_Location
    Montreal, QC
  • Type

    conf

  • DOI
    10.1109/AHS.2015.7231175
  • Filename
    7231175