DocumentCode :
3244309
Title :
Dynamic data replication: an approach to providing fault-tolerant shared memory clusters
Author :
Christodoulopoulou, Rosalia ; Azimi, Reza ; Bilas, Angelos
Author_Institution :
Dept. of Comput. Sci., Toronto Univ., Ont., Canada
fYear :
2003
fDate :
8-12 Feb. 2003
Firstpage :
203
Lastpage :
214
Abstract :
A challenging issue in today´s server systems is to transparently deal with failures and application-imposed requirements for continuous operation. In this paper we address this problem in shared virtual memory (SVM) clusters at the programming abstraction layer. We design extensions to an existing SVM protocol that has been tuned for low-latency, high-bandwidth interconnects and SMP nodes and we achieve reliability through dynamic replication of application shared data and protocol information. Our extensions allow us to tolerate single (or multiple, but not simultaneous) node failures. We implement our extensions on a state-of-the-art cluster and we evaluate the common, failure-free case. We find that, although the complexity of our protocol is substantially higher than its failure-free counterpart, by taking advantage of architectural features of modern systems our approach imposes low overhead and can be employed for transparently dealing with system failures.
Keywords :
computer network reliability; fault tolerant computing; local area networks; performance evaluation; protocols; shared memory systems; SMP nodes; SVM clusters; SVM protocol; dynamic data replication; fault-tolerant shared memory clusters; high-bandwidth interconnects; low-latency interconnects; node failures; performance evaluation; programming abstraction layer; reliability; server systems; shared virtual memory clusters; Availability; Buildings; Concurrent computing; Costs; Fault tolerance; Focusing; Monitoring; Operating systems; Protocols; Support vector machines;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
High-Performance Computer Architecture, 2003. HPCA-9 2003. Proceedings. The Ninth International Symposium on
ISSN :
1530-0897
Print_ISBN :
0-7695-1871-0
Type :
conf
DOI :
10.1109/HPCA.2003.1183538
Filename :
1183538
Link To Document :
بازگشت