• DocumentCode
    2262745
  • Title

    A highly available transaction processing system with non-disruptive failure handling

  • Author

    Su, Gong ; Iyengar, Arun

  • Author_Institution
    IBM T.J. Watson Res. Center, Yorktown Heights, NY, USA
  • fYear
    2012
  • fDate
    16-20 April 2012
  • Firstpage
    409
  • Lastpage
    416
  • Abstract
    We present a highly available system for environments such as stock trading, where high request rates and low latency requirements dictate that service disruption on the order of seconds in length can be unacceptable. After a node failure, our system avoids delays in processing due to detecting the failure or transferring control to a back-up node. We achieve this by using multiple primary nodes which process transactions concurrently as peers. If a primary node fails, the remaining primaries continue executing without being delayed at all by the failed primary. Nodes agree on a total ordering for processing requests with a novel low overhead wait-free algorithm that utilizes a small amount of shared memory accessible to the nodes and a simple compare-and-swap like protocol which allows the system to progress at the speed of the fastest node. We have implemented our system and show experimentally that it performs well and can transparently handle node failures without causing delays to transaction processing. The efficient implementation of our algorithm for ordering transactions is a critically important factor in achieving good performance.
  • Keywords
    concurrency control; shared memory systems; system recovery; transaction processing; back-up node; compare-and-swap like protocol; control transfer; delays; highly available transaction processing system; node failure handling; nondisruptive failure handling; service disruption; shared memory; stock trading; wait-free algorithm; Algorithm design and analysis; Delay; Fault tolerance; Fault tolerant systems; Peer to peer computing; Protocols; Synchronization; computer-driven trading; fault tolerance; high availability; total ordering algorithm; transaction processing;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Network Operations and Management Symposium (NOMS), 2012 IEEE
  • Conference_Location
    Maui, HI
  • ISSN
    1542-1201
  • Print_ISBN
    978-1-4673-0267-8
  • Electronic_ISBN
    1542-1201
  • Type

    conf

  • DOI
    10.1109/NOMS.2012.6211925
  • Filename
    6211925