• DocumentCode
    1338918
  • Title

    A Flexible Approach to Improving System Reliability with Virtual Lockstep

  • Author

    Jeffery, Casey M. ; Figueiredo, Renato J O

  • Author_Institution
    Dept. of Electr. & Comput. Eng., Univ. of Florida, Gainesville, FL, USA
  • Volume
    9
  • Issue
    1
  • fYear
    2012
  • Firstpage
    2
  • Lastpage
    15
  • Abstract
    There is an increasing need for fault tolerance capabilities in logic devices brought about by the scaling of transistors to ever smaller geometries. This paper presents a hypervisor-based replication approach that can be applied to commodity hardware to allow for virtually lockstepped execution. It offers many of the benefits of hardware-based lockstep while being cheaper and easier to implement and more flexible in the configurations supported. A novel form of processor state fingerprinting is also presented, which can significantly reduce the fault detection latency. This further improves reliability by triggering rollback recovery before errors are recorded to a checkpoint. The mechanisms are validated using a full prototype and the benchmarks considered indicate an average performance overhead of approximately 14 percent with the possibility for significant optimization. Finally, a unique method of using virtual lockstep for fault injection testing is presented and used to show that significant detection latency reduction is achievable by comparing only a small amount of data across replicas.
  • Keywords
    electronic engineering computing; fault diagnosis; fault tolerance; integrated circuit reliability; logic circuits; optimisation; transistors; detection latency reduction; fault detection latency; fault injection testing; fault tolerance capabilities; hypervisor based replication approach; logic devices; optimization; processor state fingerprinting; rollback recovery; system reliability; transistors scaling; virtually lockstepped execution; Fault detection; Fingerprint recognition; Hardware; Prototypes; Reliability engineering; Virtualization; autonomic computing.; dependable architectures; fault injection; software reliability;
  • fLanguage
    English
  • Journal_Title
    Dependable and Secure Computing, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    1545-5971
  • Type

    jour

  • DOI
    10.1109/TDSC.2010.53
  • Filename
    5590258