• DocumentCode
    124394
  • Title

    Shadow Computing: An energy-aware fault tolerant computing model

  • Author

    Mills, B. ; Znati, Taieb ; Melhem, Rami

  • Author_Institution
    Dept. of Comput. Sci., Univ. of Pittsburgh, Pittsburgh, PA, USA
  • fYear
    2014
  • fDate
    3-6 Feb. 2014
  • Firstpage
    73
  • Lastpage
    77
  • Abstract
    The current response to fault tolerance relies upon either time or hardware redundancy in order to mask faults. Time redundancy implies a re-execution of the failed computation after the failure has been detected, although this can further be optimized by the use of checkpoints these solutions still impose a significant delay. In many mission critical systems hardware redundancy has traditionally deployed in the form of process replication to provide fault tolerance, avoiding delay and maintaining tight deadlines. Both approaches have drawbacks, re-execution requiring additional time and replication requiring additional resources, especially energy. This forces the systems engineer to choose between time or hardware redundancy, cloud computing environments have largely chosen replication because response time is often critical. In this paper we propose a new computational model called shadow computing, which provides goal-based adaptive resilience through the use of dynamic execution. Using this general model we develop shadow replication which enables a parameterized tradeoff between time and hardware redundancy to provide fault tolerance. Then we build an analytical model to predict the expected energy savings and provide an analysis using that model.
  • Keywords
    cloud computing; fault tolerant computing; redundancy; cloud computing environments; dynamic execution; energy-aware fault tolerant computing; expected energy savings; failure detection; goal-based adaptive resilience; hardware redundancy; mission critical systems; parameterized tradeoff; process replication; shadow computing; shadow replication; systems engineer; time redundancy; Computational modeling; fault tolerance; resiliency; scheduling; shadow computing;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Computing, Networking and Communications (ICNC), 2014 International Conference on
  • Conference_Location
    Honolulu, HI
  • Type

    conf

  • DOI
    10.1109/ICCNC.2014.6785308
  • Filename
    6785308