Title :
Acceleration of dRMSD calculation and efficient usage of GPU caches
Author :
Filipovic, Jiri ; Plhak, Jan ; Strelak, David
Author_Institution :
Fac. of Inf., Masaryk Univ., Brno, Czech Republic
Abstract :
In this paper, we introduce the GPU acceleration of dRMSD algorithm, used to compare different structures of a molecule. Comparing to multithreaded CPU implementation, we have reached 13.4× speedup in clustering and 62.7× speedup in I:I dRMSD computation using mid-end GPU. The dRMSD computation exposes strong memory locality and thus is compute-bound. Along with conservative implementation using shared memory, we have decided to implement variants of the algorithm using GPU caches to maintain memory locality. Our implementation using cache reaches 96.5% and 91.6% of shared memory performance on Fermi and Maxwell, respectively. We have identified several performance pitfalls related to cache blocking in compute-bound codes and suggested optimization techniques to improve the performance.
Keywords :
cache storage; graphics processing units; multi-threading; optimisation; pattern clustering; shared memory systems; Fermi; GPU acceleration; GPU caches; Maxwell; cache blocking; clustering speedup; compute-bound codes; dRMSD calculation; memory locality; mid-end GPU; multithreaded CPU implementation; optimization techniques; shared memory; shared memory performance; Bandwidth; Computer architecture; Graphics processing units; Instruction sets; Kernel; Optimization; Registers; GPU; RMSD; cache; code optimization;
Conference_Titel :
High Performance Computing & Simulation (HPCS), 2015 International Conference on
Conference_Location :
Amsterdam
Print_ISBN :
978-1-4673-7812-3
DOI :
10.1109/HPCSim.2015.7237020