Title :
BQCD with GPI: A case study
Author :
Grünewald, Daniel
Author_Institution :
Fraunhofer ITWM, Kaiserslautern, Germany
Abstract :
We compare the BQCD performance, a typical high performance computer application, using either the MPI or the Fraunhofer GPI communication library. In our analysis, we focus on the BQCD performance critical part covering 50 percent of the total program run-time. This is given by the computation of a four-dimensional nearest neighbor stencil operator in a domain decomposed simulation volume. Hence, BQCD is a typical representative for the broad class of stencil algorithms. In order to obtain optimal speedup, we overlap the communication with the computation and analyse the resulting run-times on two test systems. We introduce the overlap efficiency as a measure for the communication library´s ability to overlap the communication with the computation. In the regime in which the raw communication latency is less than the raw computational time, the overlap efficiency should be equal to one. This regime depends on the problem size and on the number of used cores. Deviations from one show possible interferences of communication and computation induced by the communication library. Side effects which disturb the scalability in practice. As result, we find that GPI has overlap efficiency equal to one, i.e. it allows for perfect overlap and ideal scalability. The total runtime is equal to the time spent for the pure computation. For the same communication pattern, MPI has overlap efficiency less than one. It cannot hide the communication completely which results in a worse scalability in general. The GPI speedups in comparison with the equivalent MPI implementation are of the order of 20-30 percent.
Keywords :
application program interfaces; message passing; software libraries; BQCD performance; Fraunhofer GPI communication library; MPI; communication interference; communication latency; communication pattern; computational time; four-dimensional nearest neighbor stencil operator; global address space programming interface; high performance computer application; overlap efficiency; run-time; stencil algorithm; total runtime; Computational modeling; Electronics packaging; Instruction sets; Lattices; Libraries; Runtime; Synchronization; 4D stencil code BQCD; GPI communication library; PGAS programming model; Parallelisation on HPC platforms;
Conference_Titel :
High Performance Computing and Simulation (HPCS), 2012 International Conference on
Conference_Location :
Madrid
Print_ISBN :
978-1-4673-2359-8
DOI :
10.1109/HPCSim.2012.6266942