DocumentCode :
560209
Title :
SPOTlight on testing: Stability, performance and operational testing of LANL HPC clusters
Author :
Pedicini, Georgia ; Green, Jennifer
Author_Institution :
High Performance Comput. Syst., Los Alamos Nat. Lab., Los Alamos, NM, USA
fYear :
2011
fDate :
12-18 Nov. 2011
Firstpage :
1
Lastpage :
8
Abstract :
Testing is sometimes a forgotten component of system management, but it becomes very important in the realm of High Performance Computing (HPC) clusters. Many large-scale HPC cluster installations are one of a kind, with unknown issues and unexpected behaviors. First, the initial installation may uncover complex configuration interactions that are only apparent at scale; Stability becomes a critical feature of early system testing. Second, Performance may be significantly impacted by small changes to the system. Third, after initial shakeout, users expect a system that is reliable on their terms; ongoing Operational tests verify reliability, and provide early warning of developing problems. A robust test suite should address all of these test categories, and present both tests and results in a manner that meets usability requirements. We will describe Los Alamos National Laboratory´s current test suite, and the development project to expand the suite to cover these areas and provide better tools for analysis and reporting.
Keywords :
program testing; software reliability; LANL HPC clusters; SPOTIight; high performance computing clusters; large-scale HPC cluster installation; operational testing; stability; system management; system testing; usability requirements; Hardware; Maintenance engineering; Measurement; Stability analysis; Supercomputers; Testing; Accessibility; High Performance Computing; Operational Testing; Performance testing; RAS; Reliability; SPOT; Serviceability; Stability Testing; Test Driven Development; Test framework;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
High Performance Computing, Networking, Storage and Analysis (SC), 2011 International Conference for
Conference_Location :
Seatle, WA
Electronic_ISBN :
978-1-4503-0771-0
Type :
conf
Filename :
6114477
Link To Document :
بازگشت