Title :
Fastpath Optimizations for Cluster Recovery in Shared-Disk Systems
Author_Institution :
Johns Hopkins University
Abstract :
We describe the design and implementation of a clustering service for a high-performance, shared-disk file system. The service provides failure detection and recovery, reliable end-to-end messaging, and a centralized and recoverable management interface. We implement novel optimizations in the voting protocol that resolves cluster membership. Optimizations allow clusters to form as quickly as possible without introducing livelock or requiring timeout parameters to be tuned carefully. Our treatment includes performance results that quantify the scalability of the system and measure recovery times.
Keywords :
Clustering algorithms; Computer science; Delay; Design engineering; File systems; Permission; Protocols; Reliability engineering; Runtime; Voting;
Conference_Titel :
Supercomputing, 2004. Proceedings of the ACM/IEEE SC2004 Conference
Print_ISBN :
0-7695-2153-3