Title of article :
Efficient utilization of launched threads on GPUs: The spherical harmonic transform as a case study Original Research Article
Author/Authors :
Feng-shun Lu، نويسنده , , Junqiang Song، نويسنده , , Wang-qun Lin، نويسنده , , Yu-fei Pang، نويسنده , , Kai-jun Ren، نويسنده , , Pei-chang Shi، نويسنده ,
Issue Information :
ماهنامه با شماره پیاپی سال 2013
Abstract :
Maximum utilization of hardware resources is crucial to leverage the enormous computational power of graphics processing units (GPUs). However, there lacks an effective metric to denote whether the launched threads are kept busy. To address this issue, we propose a metric called ETU to describe the efficiency of threads utilization. First, we execute several CUDA-SDK sample codes, with(out) double precision arithmetic, on two generations of GPUs so as to perform a preliminary validation of the ETU metric. Taking the spherical harmonic transform as an example, we then give two GPU implementations for Legendre transforms and check the relationship between ETU and application performance. Experimental results show that applications with larger ETU can usually achieve better performance, which is more accurate than occupancy proposed by NVIDIA. Finally, we select the GPU implementations with better performance to accelerate Legendre transforms in STSWM, which is a spectral transform shallow water model.
Keywords :
Arithmetic precision , Thread utilization , GPU , Spherical harmonic transform , Occupancy
Journal title :
Computer Physics Communications
Journal title :
Computer Physics Communications