• DocumentCode
    3145552
  • Title

    Comprehensive Performance Monitoring for GPU Cluster Systems

  • Author

    Fürlinger, Karl ; Wright, Nicholas J. ; Skinner, David

  • Author_Institution
    Comput. Sci. Dept., Ludwig-Maximilians-Univ. (LMU) Munich, Munich, Germany
  • fYear
    2011
  • fDate
    16-20 May 2011
  • Firstpage
    1377
  • Lastpage
    1386
  • Abstract
    Accelerating applications with GPUs has recently garnered a lot of interest from the scientific computing community. While tools for optimizing individual kernels are readily available, there is a lack of support for the specific needs of the HPC area. Most importantly, integration with existing parallel programming models (MPI and threading) and scalability to the full size of the machine are required. To address these issues we present our work on monitoring and performance evaluation of the CUDA runtime environment in the context of our scalable and efficient profiling tool IPM. We derive metrics for GPU utilization and identify missed opportunities for GPU-CPU overlap. We evaluate the monitoring accuracy and overheads of our approach and apply it to a full scientific application.
  • Keywords
    parallel programming; software performance evaluation; system monitoring; workstation clusters; CUDA runtime environment; GPU cluster systems; HPC; MPI; comprehensive performance monitoring; parallel programming models; performance evaluation; threading; Acceleration; Graphics processing unit; Kernel; Libraries; Monitoring; Runtime; Timing;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Parallel and Distributed Processing Workshops and Phd Forum (IPDPSW), 2011 IEEE International Symposium on
  • Conference_Location
    Shanghai
  • ISSN
    1530-2075
  • Print_ISBN
    978-1-61284-425-1
  • Electronic_ISBN
    1530-2075
  • Type

    conf

  • DOI
    10.1109/IPDPS.2011.289
  • Filename
    6008992