• DocumentCode
    2460237
  • Title

    Federate Fault Tolerance in HLA-Based Simulation

  • Author

    Li, Zengxiang ; Cai, Wentong ; Turner, Stephen John ; Pan, Ke

  • Author_Institution
    Sch. of Comput. Eng., Nanyang Technol. Univ., Singapore, Singapore
  • fYear
    2010
  • fDate
    17-19 May 2010
  • Firstpage
    1
  • Lastpage
    10
  • Abstract
    A large scale HLA-based simulation (federation) is composed of a large number of simulation components (federates), which may be developed by different participants and executed at different locations. These federates are subject to failures due to various reasons. What is worse, the risk of federation failure increases with the number of federates in the federation. In this paper, a fault tolerance mechanism is proposed to tolerate the crash-stop failures of federates. By exploiting the decoupled federate architecture, federate failures can be masked from the federation and recovery can take place without interrupting the executions of other federates. A basic state recovery protocol is first proposed to recover the state of the failed federate relying on the checkpoint and message logging taken before the failure. Then, an optimized protocol is further developed to accelerate the state recovery procedure. Experiments are carried out to verify that the proposed mechanism provides correct failure recovery. The experimental results also indicate that the optimized protocol can outperform the basic one considerably.
  • Keywords
    fault tolerance; software architecture; system monitoring; system recovery; HLA-based simulation; crash-stop failures; fault tolerance mechanism; federate fault tolerance; message logging; Acceleration; Biological system modeling; Computational modeling; Computer crashes; Concurrent computing; Distributed computing; Fault tolerance; Large-scale systems; Protocols; Standards development;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Principles of Advanced and Distributed Simulation (PADS), 2010 IEEE Workshop on
  • Conference_Location
    Atlanta
  • ISSN
    1087-4097
  • Print_ISBN
    978-1-4244-7292-5
  • Electronic_ISBN
    1087-4097
  • Type

    conf

  • DOI
    10.1109/PADS.2010.5471663
  • Filename
    5471663