• DocumentCode
    888372
  • Title

    Active replication of multithreaded applications

  • Author

    Basile, Claudio ; Kalbarczyk, Zbigniew ; Iyer, Ravishankar K.

  • Author_Institution
    Center for Reliable & High-Performance Comput., Illinois Univ., Urbana, IL, USA
  • Volume
    17
  • Issue
    5
  • fYear
    2006
  • fDate
    5/1/2006 12:00:00 AM
  • Firstpage
    448
  • Lastpage
    465
  • Abstract
    Software-based active replication is expensive in terms of performance overhead. Multithreading can help improve performance; however, thread scheduling is a source of nondeterminism in replica behavior. To achieve strong replica consistency in multithreaded environments, this paper proposes intercepting mutex lock/unlock operations performed by threads on accessing the shared data and contributes with two algorithmic solutions: 1) a loose synchronization algorithm (LSA), which captures the natural concurrency in a leader replica and projects it on follower replicas through interreplica communication, and 2) a preemptive deterministic scheduler (PDS) algorithm, which removes the need for interreplica communication through the notion of round and by suspending threads when it is unable (yet) to schedule them deterministically. Failure behavior and performance of LSA and PDS implementations are evaluated in a triplicated system and compared with existing solutions. A performance evaluation indicates that LSA and PDS outperform existing solutions, with PDS offering lower throughput than LSA. A fault-injection campaign shows that PDS is more robust to errors due to the absence of interreplica communication. Hence, LSA and PDS represent a trade-off between performance and dependability. Finally, LSA and PDS are demonstrated in replicating the Apache Web server, a substantial real-world application.
  • Keywords
    Internet; fault tolerant computing; multi-threading; replicated databases; Apache Web server; LSA; PDS algorithm; interreplica communication; loose synchronization algorithm; multithreaded application; mutex lock operation; mutex unlock operation; preemptive deterministic scheduler; software-based active replication; Concurrent computing; Fault tolerance; Interleaved codes; Multithreading; Redundancy; Robustness; Scheduling algorithm; Throughput; Web server; Fault tolerance; fault injection.; multithreading; nondeterminism; replication;
  • fLanguage
    English
  • Journal_Title
    Parallel and Distributed Systems, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    1045-9219
  • Type

    jour

  • DOI
    10.1109/TPDS.2006.56
  • Filename
    1613853