Title :
A Flexible and Portable Large-Scale DGEMM Library for Linpack on Next-Generation Multi-GPU Systems
Author :
Rohr, David ; Lindenstruth, Volker
Author_Institution :
Frankfurt Inst. for Adv. Studies, Frankfurt am Main, Germany
Abstract :
In recent years, high performance computing has benefitted greatly from special accelerator cards such as GPUs. Matrix multiplication performed by the BLAS function DGEMM is one of the prime examples where such accelerators excel. DGEMM is the computational hotspot of many tasks, among them the Linpack benchmark. Current GPUs achieve more than 1 TFLOPS real performance in this task. Being connected via PCI Express, one can easily install multiple GPUs in a single compute node. This enables the construction of multi-TFLOPS systems out of off-the-shelf components. At such high performance, it is often complicated to feed the GPUs with sufficient data to run at full performance. In this paper we first analyze the scalability of our DGEMM implementation for multiple fast GPUs. Then we suggest a new scheme optimized for this situation and we present an implementation.
Keywords :
graphics processing units; matrix multiplication; parallel processing; peripheral interfaces; Linpack benchmark; PCI express; accelerators excel; computational hotspot; flexible DGEMM library; high performance computing; matrix multiplication; multi-TFLOPS systems; next-generation multi-GPU systems; portable large-scale DGEMM library; single compute node; special accelerator cards; Bandwidth; Benchmark testing; Engines; Graphics processing units; Kernel; Memory management; Next generation networking; HPL Linpack DGEMM OpenCL CUDA BLAS GPU multi-GPU DMA;
Conference_Titel :
Parallel, Distributed and Network-Based Processing (PDP), 2015 23rd Euromicro International Conference on
Conference_Location :
Turku
DOI :
10.1109/PDP.2015.89