Title :
Towards an Autonomic Cluster Management System (ACMS) with Reflex Autonomicity
Author :
Truszkowski, Walt ; Hinchey, Mike ; Sterritt, Roy
Author_Institution :
Inf. Syst. Div., NASA Goddard Space Flight Center, Greenbelt, MD
Abstract :
Cluster computing, whereby a large number of simple processors or nodes are combined together to apparently function as a single powerful computer, has emerged as a research area in its own right. The approach offers a relatively inexpensive means of providing a fault-tolerant environment and achieving significant computational capabilities for high-performance computing applications. However, the task of manually managing and configuring a cluster quickly becomes daunting as the cluster grows in size. Autonomic computing, with its vision to provide self-management, can potentially solve many of the problems inherent in cluster management. We describe the development of a prototype autonomic cluster management system (ACMS) that exploits autonomic properties in automating cluster management and its evolution to include reflex reactions via pulse monitoring
Keywords :
fault tolerant computing; grid computing; mobile agents; workstation clusters; autonomic cluster management system; autonomic computing; cluster computing; computational capability; fault-tolerant environment; high-performance computing; pulse monitoring; reflex autonomicity; reflex reaction; self-management; Availability; Computer networks; Concurrent computing; Energy management; High performance computing; NASA; Power system management; Prototypes; Scalability; Space technology;
Conference_Titel :
Parallel and Distributed Systems, 2005. Proceedings. 11th International Conference on
Conference_Location :
Fukuoka
Print_ISBN :
0-7695-2281-5
DOI :
10.1109/ICPADS.2005.281