• DocumentCode
    166603
  • Title

    It takes a village: Monitoring the blue waters supercomputer

  • Author

    Semeraro, B.D. ; Sisneros, Robert ; Fullop, Joshi ; Bauer, Gregory H.

  • Author_Institution
    Nat. Center for Supercomput. Applic., Univ. of Illinois at Urbana-Champaign, Urbana, IL, USA
  • fYear
    2014
  • fDate
    22-26 Sept. 2014
  • Firstpage
    392
  • Lastpage
    399
  • Abstract
    The performance of science applications on modern HPC equipment depends on many factors. Architectural features, individual hardware characteristics, and scheduler traits all have an impact on how a particular application performs, not only in isolation but when run in concert with other user applications. Being able to correlate system events and conditions at particular times can give insight into causes of good or bad performance. Unfortunately, the information we seek is not necessarily in a readily accessible form. The problem at hand is how to enable efficient query of the raw data and flexible graphical representation of the results. Web applications that access an underlying database serve this sort of functionality for many science applications quite well. Our scenario of data access is not very different. The data collected for a large HPC environment is complex and grows in size with time. This aspect is different from applications that deal with more static data. It is the dynamic nature of the data that make the problem interesting. In this work we present our approach for the analysis and visualization of HPC system performance data based on database access and web based graphical presentation. We discuss the details of how data is collected and processed from raw logs into the database, how queries are formulated, and how the data are graphically displayed. This process includes dynamic formulation of the queries. Finally we discuss how the system is utilized to analyze system performance.
  • Keywords
    computerised monitoring; data visualisation; parallel architectures; parallel machines; performance evaluation; processor scheduling; query formulation; Blue Waters supercomputer monitoring; HPC environment; HPC system performance data analysis; HPC system performance data visualization; Web based graphical presentation; architectural features; data access; database access; hardware characteristics; modern HPC equipment; queries dynamic formulation; scheduler traits; system events; system performance analysis; Buildings; Data collection; Data visualization; Databases; Measurement; Monitoring; System performance;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Cluster Computing (CLUSTER), 2014 IEEE International Conference on
  • Conference_Location
    Madrid
  • Type

    conf

  • DOI
    10.1109/CLUSTER.2014.6968671
  • Filename
    6968671