DocumentCode
2262745
Title
A highly available transaction processing system with non-disruptive failure handling
Author
Su, Gong ; Iyengar, Arun
Author_Institution
IBM T.J. Watson Res. Center, Yorktown Heights, NY, USA
fYear
2012
fDate
16-20 April 2012
Firstpage
409
Lastpage
416
Abstract
We present a highly available system for environments such as stock trading, where high request rates and low latency requirements dictate that service disruption on the order of seconds in length can be unacceptable. After a node failure, our system avoids delays in processing due to detecting the failure or transferring control to a back-up node. We achieve this by using multiple primary nodes which process transactions concurrently as peers. If a primary node fails, the remaining primaries continue executing without being delayed at all by the failed primary. Nodes agree on a total ordering for processing requests with a novel low overhead wait-free algorithm that utilizes a small amount of shared memory accessible to the nodes and a simple compare-and-swap like protocol which allows the system to progress at the speed of the fastest node. We have implemented our system and show experimentally that it performs well and can transparently handle node failures without causing delays to transaction processing. The efficient implementation of our algorithm for ordering transactions is a critically important factor in achieving good performance.
Keywords
concurrency control; shared memory systems; system recovery; transaction processing; back-up node; compare-and-swap like protocol; control transfer; delays; highly available transaction processing system; node failure handling; nondisruptive failure handling; service disruption; shared memory; stock trading; wait-free algorithm; Algorithm design and analysis; Delay; Fault tolerance; Fault tolerant systems; Peer to peer computing; Protocols; Synchronization; computer-driven trading; fault tolerance; high availability; total ordering algorithm; transaction processing;
fLanguage
English
Publisher
ieee
Conference_Titel
Network Operations and Management Symposium (NOMS), 2012 IEEE
Conference_Location
Maui, HI
ISSN
1542-1201
Print_ISBN
978-1-4673-0267-8
Electronic_ISBN
1542-1201
Type
conf
DOI
10.1109/NOMS.2012.6211925
Filename
6211925
Link To Document