DocumentCode
3145552
Title
Comprehensive Performance Monitoring for GPU Cluster Systems
Author
Fürlinger, Karl ; Wright, Nicholas J. ; Skinner, David
Author_Institution
Comput. Sci. Dept., Ludwig-Maximilians-Univ. (LMU) Munich, Munich, Germany
fYear
2011
fDate
16-20 May 2011
Firstpage
1377
Lastpage
1386
Abstract
Accelerating applications with GPUs has recently garnered a lot of interest from the scientific computing community. While tools for optimizing individual kernels are readily available, there is a lack of support for the specific needs of the HPC area. Most importantly, integration with existing parallel programming models (MPI and threading) and scalability to the full size of the machine are required. To address these issues we present our work on monitoring and performance evaluation of the CUDA runtime environment in the context of our scalable and efficient profiling tool IPM. We derive metrics for GPU utilization and identify missed opportunities for GPU-CPU overlap. We evaluate the monitoring accuracy and overheads of our approach and apply it to a full scientific application.
Keywords
parallel programming; software performance evaluation; system monitoring; workstation clusters; CUDA runtime environment; GPU cluster systems; HPC; MPI; comprehensive performance monitoring; parallel programming models; performance evaluation; threading; Acceleration; Graphics processing unit; Kernel; Libraries; Monitoring; Runtime; Timing;
fLanguage
English
Publisher
ieee
Conference_Titel
Parallel and Distributed Processing Workshops and Phd Forum (IPDPSW), 2011 IEEE International Symposium on
Conference_Location
Shanghai
ISSN
1530-2075
Print_ISBN
978-1-61284-425-1
Electronic_ISBN
1530-2075
Type
conf
DOI
10.1109/IPDPS.2011.289
Filename
6008992
Link To Document