Title :
Sustained systems performance monitoring at the U.S. Department of Defense High Performance Computing Modernization Program
Author :
Bennett, Paul M.
Author_Institution :
U.S. DoD High Performance Comput. Modernization Program, Vicksburg, MS, USA
Abstract :
The U.S. Department of Defense High Performance Computing Modernization Program (HPCMP) has implemented sustained systems performance testing on high performance computing systems in use at DoD Supercomputing Resource Centers. The intent is to monitor performance improvements by updates to the operating system, compiler suites, and numerical and communications libraries, and to monitor penalties arising from security patches. In practice, each system´s workload is simulated by appropriate choices of user application codes representative of the HPCMP computational technical areas. Past successes include surfacing an imminent failure of an OST in a Cray XT3, incomplete configuration of a scheduler update on an SGI Altix 4700, performance issues associated with a communications library update for a Linux Networx Advanced Technology Cluster, and intermittent resetting of Intel Nehalem cores to standard mode from turbo mode. This history demonstrates that SSP testing is critical to deliver the highest quality of service to the HPCMP users.
Keywords :
military computing; parallel machines; performance evaluation; Cray XT3; DoD supercomputing resource centers; HPCMP; Intel Nehalem cores; Linux networx advanced technology cluster; SGI Altix 4700; U.S. department of defense high performance computing modernization program; communications libraries; compiler suites; high performance computing modernization program; numerical libraries; operating system; Sustained Systems Performance;
Conference_Titel :
High Performance Computing, Networking, Storage and Analysis (SC), 2011 International Conference for
Conference_Location :
Seatle, WA
Electronic_ISBN :
978-1-4503-0771-0