• DocumentCode
    1340601
  • Title

    Automatic Reconfiguration for Large-Scale Reliable Storage Systems

  • Author

    Rodrigues, Rodrigo ; Liskov, Barbara ; Chen, Kathryn ; Liskov, Moses ; Schultz, David

  • Author_Institution
    Univ. Nova de Lisboa, Monte da Caparica, Portugal
  • Volume
    9
  • Issue
    2
  • fYear
    2012
  • Firstpage
    145
  • Lastpage
    158
  • Abstract
    Byzantine-fault-tolerant replication enhances the availability and reliability of Internet services that store critical state and preserve it despite attacks or software errors. However, existing Byzantine-fault-tolerant storage systems either assume a static set of replicas, or have limitations in how they handle reconfigurations (e.g., in terms of the scalability of the solutions or the consistency levels they provide). This can be problematic in long-lived, large-scale systems where system membership is likely to change during the system lifetime. In this paper, we present a complete solution for dynamically changing system membership in a large-scale Byzantine-fault-tolerant system. We present a service that tracks system membership and periodically notifies other system nodes of membership changes. The membership service runs mostly automatically, to avoid human configuration errors; is itself Byzantine-fault-tolerant and reconfigurable; and provides applications with a sequence of consistent views of the system membership. We demonstrate the utility of this membership service by using it in a novel distributed hash table called dBQS that provides atomic semantics even across changes in replica sets. dBQS is interesting in its own right because its storage algorithms extend existing Byzantine quorum protocols to handle changes in the replica set, and because it differs from previous DHTs by providing Byzantine fault tolerance and offering strong semantics. We implemented the membership service and dBQS. Our results show that the approach works well, in practice: the membership service is able to manage a large system and the cost to change the system membership is low.
  • Keywords
    Internet; security of data; software fault tolerance; Byzantine fault tolerant replication; Byzantine quorum protocols; Internet services; automatic reconfiguration; distributed hash table; large scale reliable storage systems; software errors; storage algorithms; Distributed processing; Fault detection; Fault tolerance; Membership renewal; Protocols; Public key; Semantics; Byzantine fault tolerance; distributed hash tables.; dynamic system membership; membership service;
  • fLanguage
    English
  • Journal_Title
    Dependable and Secure Computing, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    1545-5971
  • Type

    jour

  • DOI
    10.1109/TDSC.2010.52
  • Filename
    5593239