DocumentCode
1825882
Title
Availability modeling and analysis on high performance cluster computing systems
Author
Song, Hertong ; Leangsuksun, Chokchai Box ; Nassar, Raja ; Gottumukkala, Narasimha Raju ; Scott, Stephen
Author_Institution
Coll. of Eng. & Sci., Louisiana Tech. Univ., Ruston, LA, USA
fYear
2006
fDate
20-22 April 2006
Abstract
Cluster computing has been attracting more and more attention from both the industry and the academia for its enormous computing power, cost effectiveness, and scalability. Availability is a key system attribute that needs to be considered both at system design stage and must reflect the actuality. System monitoring and logging enables identifying unplanned events to reflect the actual system´s availability. This paper proposes a single framework that coordinates event monitoring, filtering, data analysis and dynamic availability modeling. The availability model is abstracted and categorized based on functionality. We describe the proposed architecture, and a sample analysis of real time event logs from a 512 node cluster from Lawrence Livermore National Laboratory.
Keywords
fault tolerant computing; system monitoring; workstation clusters; data analysis; dynamic availability modeling; event monitoring; high performance cluster computing systems; real time event logs; system design; system monitoring; Availability; Computer industry; Costs; Filtering; High performance computing; Monitoring; Performance analysis; Power system modeling; Scalability; System analysis and design;
fLanguage
English
Publisher
ieee
Conference_Titel
Availability, Reliability and Security, 2006. ARES 2006. The First International Conference on
Print_ISBN
0-7695-2567-9
Type
conf
DOI
10.1109/ARES.2006.37
Filename
1625325
Link To Document