• DocumentCode
    1072250
  • Title

    AQuA: an adaptive architecture that provides dependable distributed objects

  • Author

    Ren, Yansong ; Bakken, David E. ; Courtney, Tod ; Cukier, Michel ; Karr, David A. ; Rubel, Paul ; Sabnis, Chetan ; Sanders, William H. ; Schantz, Richard E. ; Seri, Mouna

  • Author_Institution
    Bell Labs., Holmdel, NJ, USA
  • Volume
    52
  • Issue
    1
  • fYear
    2003
  • Firstpage
    31
  • Lastpage
    50
  • Abstract
    Building dependable distributed systems from commercial off-the-shelf components is of growing practical importance. For both cost and production reasons, there is interest in approaches and architectures that facilitate building such systems. The AQuA architecture is one such approach; its goal is to provide adaptive fault tolerance to CORBA applications by replicating objects. The AQuA architecture allows application programmers to request desired levels of dependability during applications´ runtimes. It provides fault tolerance mechanisms to ensure that a CORBA client can always obtain reliable services, even if the CORBA server object that provides the desired services suffers from crash failures and value faults. AQuA includes a replicated dependability manager that provides dependability management by configuring the system in response to applications´ requests and changes in system resources due to faults. It uses Maestro/Ensemble to provide group communication services. It contains a gateway to intercept standard CORBA IIOP messages to allow any standard CORBA application to use AQuA. It provides different types of replication schemes to forward messages reliably to the remote replicated objects. All of the replication schemes ensure strong, data consistency among replicas. This paper describes the AQuA architecture and presents, in detail, the active replication pass-first scheme. In addition, the interface to the dependability manager and the design of the dependability manager replication are also described. Finally, we describe performance measurements that were conducted for the active replication pass-first scheme, and we present results from our study of fault detection, recovery, and blocking times.
  • Keywords
    data integrity; distributed object management; quality of service; software fault tolerance; AQuA; CORBA; active replication pass-first scheme; adaptive architecture; adaptive fault tolerance; data consistency; dependable distributed objects; performance measurements; replicated dependability manager; replication schemes; system resources; Buildings; Computer crashes; Costs; Fault detection; Fault tolerance; Measurement; Production systems; Programming profession; Resource management; Runtime;
  • fLanguage
    English
  • Journal_Title
    Computers, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    0018-9340
  • Type

    jour

  • DOI
    10.1109/TC.2003.1159752
  • Filename
    1159752