• DocumentCode
    2372244
  • Title

    A local checkpoint mechanism for on-board computing

  • Author

    Zhang, Chengye ; Deng, Shenglan ; Ning, Hong

  • Author_Institution
    Dept. of Comput. Sci., Nat. Univ. of Defense Technol., Changsha, China
  • fYear
    2012
  • fDate
    23-25 March 2012
  • Firstpage
    520
  • Lastpage
    526
  • Abstract
    Recent advance in the space application has increased the requirements for more real-time and reliable on-board computation. Running in a harsh space environment, on-board computers could suffer transient faults, such as single event upset (SEU) frequently. Reloading tasks or rebooting system may recover these faults, but they have seriously effects on deadlines of tasks and increase resource and energy consumption. We proposed a feasible and efficient local checkpoint model (LCM). Based on the reliability of the memory subsystem and the availability of soft-implemented fault detection techniques, a segmented rollback recovery is used in LCM. We also implemented a local checkpoint mechanism (LCMech) on VxWorks. Experiment results show that LCMech has an advantage of the space and time overhead and a strong capability of fault recovery supported by soft-implemented error detection. Therefore, the local checkpoint could satisfy requirements for real-time and reliable on-board computation and improve efficiency of fault tolerance of computer systems in space.
  • Keywords
    checkpointing; error detection; fault diagnosis; fault tolerant computing; real-time systems; spacecraft computers; LCM; LCMech; SEU; VxWorks; computer system fault tolerance; energy consumption; fault recovery; local checkpoint mechanism; local checkpoint model; memory subsystem reliability; on-board computing; real-time on-board computation; rebooting system; reloading tasks; resource consumption; segmented rollback recovery; single event upset; soft-implemented error detection; soft-implemented fault detection technique availability; space application; space overhead; task deadlines; time overhead; transient faults; Computers; Fault detection; Fault tolerance; Fault tolerant systems; Real time systems; Transient analysis;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Information Science and Technology (ICIST), 2012 International Conference on
  • Conference_Location
    Hubei
  • Print_ISBN
    978-1-4577-0343-0
  • Type

    conf

  • DOI
    10.1109/ICIST.2012.6221701
  • Filename
    6221701