Title :
Push Me Pull You: Integrating Opposing Data Transport Modes for Efficient HPC Application Monitoring
Author :
Omar Aaziz;Jonathan Cook;Hadi Sharifi
Author_Institution :
Comput. Sci. Dept., New Mexico State Univ., Las Cruces, NM, USA
Abstract :
While HPC system monitoring is a necessary and accepted practice, applications are still basically opaque in the production environment. For better HPC platform management and utilization, especially as platforms push towards exascale size, HPC applications need to be more transparent in their execution in the production environment. PROMON is a framework for application monitoring in the production environment, but its design concentrated on the front end issues of offering easy to use application instrumentation. This paper presents the integration of PROMON with LDMS, a proven efficient HPC system monitoring framework. PROMON and LDMS offer a case study in integrating two disparate instrumentation and monitoring models, and the lessons are applicable to other HPC monitoring issues.
Keywords :
"Monitoring","Instruments","Production","Data collection","Measurement","Data structures","Libraries"
Conference_Titel :
Cluster Computing (CLUSTER), 2015 IEEE International Conference on
DOI :
10.1109/CLUSTER.2015.118