Title :
NWPerf: a system wide performance monitoring tool for large Linux clusters
Author_Institution :
Pacific Northwest Nat. Lab., Richland, WA, USA
Abstract :
We present NWPerf, a new system for analyzing fine granularity performance metric data on large-scale supercomputing clusters. This tool is able to measure application efficiency on a system wide basis from both a global system perspective as well as providing a detailed view of individual applications. NWPerf provides this service while minimizing the impact on the performance of user applications. We describe the type of information that can be derived from the system, and demonstrate how the system was used detect and eliminate a performance problem in an application application that improved performance by up to several thousand percent. The NWPerf architecture has proven to be a stable and scalable platform for gathering performance data on a large 1954-CPU production Linux cluster at PNNL.
Keywords :
Linux; parallel machines; performance evaluation; system monitoring; workstation clusters; 1954-CPU production Linux cluster; Linux clusters; NWPerf architecture; fine granularity performance metric data analysis; large-scale supercomputing clusters; system wide performance monitoring tool; user applications; Aggregates; Laboratories; Linux; Measurement; Memory; Monitoring; Performance analysis; Processor scheduling; Relational databases; Statistics;
Conference_Titel :
Cluster Computing, 2004 IEEE International Conference on
Print_ISBN :
0-7803-8694-9
DOI :
10.1109/CLUSTR.2004.1392637