DocumentCode :
1640882
Title :
Symmetric Active/Active High Availability for High-Performance Computing System Services: Accomplishments and Limitations
Author :
Engelmann, C. ; Scott, S.L. ; Leangsuksun, C. ; He, X.
Author_Institution :
Comput. Sci. & Math. Div., Oak Ridge Nat. Lab., Oak Ridge, TN
fYear :
2008
Firstpage :
813
Lastpage :
818
Abstract :
This paper summarizes our efforts over the last 3-4 years in providing symmetric active/active high availability for high-performance computing (HPC) system services. This work paves the way for high-level reliability, availability and serviceability in extreme-scale HPC systems by focusing on the most critical components, head and service nodes, and by reinforcing them with appropriate high availability solutions. This paper presents our accomplishments in the form of concepts and respective prototypes, discusses existing limitations, outlines possible future work, and describes the relevance of this research to other, planned efforts.
Keywords :
software fault tolerance; active high availability; high-level reliability; high-performance computing system services; serviceability; symmetric active availability; Availability; Computer science; Grid computing; Laboratories; Magnetic heads; Mathematics; Prototypes; Taxonomy; Telecommunication computing; USA Councils;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Cluster Computing and the Grid, 2008. CCGRID '08. 8th IEEE International Symposium on
Conference_Location :
Lyon
Print_ISBN :
978-0-7695-3156-4
Electronic_ISBN :
978-0-7695-3156-4
Type :
conf
DOI :
10.1109/CCGRID.2008.78
Filename :
4534309
Link To Document :
بازگشت