Title :
Disaster tolerant Wolfpack geo-clusters
Author :
Wilkins, Richard S. ; Du, Xing ; Cochran, Robert A. ; Popp, Matthias
Author_Institution :
Hewlett-Packard Co., Bellevue, WA, USA
Abstract :
Clustering of computer systems to increase application availability has become a common industry practice. While it does increase the availability of applications and their data to users, it does not solve the problem of a disaster (flood, tornado, earthquake, terrorism, civil unrest, etc.) making the entire cluster, and the applications and data it is serving, unavailable. Distance mirroring of an application\´s data store allows for recovery from disaster but may still result in long periods of unacceptable downtime. This paper describes a method for stretching a standard Wolfpack (Microsoft™ Cluster Service, MSCS) cluster of Intel architecture servers geographically for disaster tolerance. Server nodes and their storage may be placed at two (or more) distant sites to prevent a single disaster from taking down the entire cluster. Standard cluster semantics and ease of use are maintained using the remote mirroring capabilities of Hewlett-Packard\´s high-end storage arrays. The design of additional software to control data mirroring behavior when moving or failing-over applications between server nodes is described. Also, software that allows "stretching" the cluster quorum disk between sites in a manner that is transparent to the cluster software and also software for an external arbitrator node that provides rapid recovery from total loss of inter-site communications is described. Flexibility provided by the array\´s firmware mirroring options (i.e. synchronous or asynchronous I/O mirroring) allows for optimum use of inter-site link bandwidth based on the data safety requirements of individual applications.
Keywords :
distributed processing; fault tolerant computing; system recovery; workstation clusters; Wolfpack; cluster quorum disk; cluster semantics; clustering; data mirroring; disaster recovery; disaster tolerance; remote mirroring; Application software; Bandwidth; Computer industry; Earthquakes; Floods; Microprogramming; Safety; Software design; Terrorism; Tornadoes;
Conference_Titel :
Cluster Computing, 2002. Proceedings. 2002 IEEE International Conference on
Print_ISBN :
0-7695-2066-9
DOI :
10.1109/CLUSTR.2002.1137750