DocumentCode :
2549016
Title :
Acceleration of Streamed Tensor Contraction Expressions on GPGPU-Based Clusters
Author :
Ma, Wenjing ; Krishnamoorthy, Sriram ; Villay, Oreste ; Kowalski, Karol
Author_Institution :
Ohio State Univ., Columbus, OH, USA
fYear :
2010
fDate :
20-24 Sept. 2010
Firstpage :
207
Lastpage :
216
Abstract :
Tensor contractions are generalized multidimensional matrix multiplication operations that widely occur in quantum chemistry. Efficient execution of tensor contractions on GPUs requires tackling several challenges to be addressed, including index permutation and small dimension-sizes reducing thread block utilization. In this paper, we present our approach to automatically generate CUDA code to execute tensor contractions on GPUs, including management of data movement between CPU and GPU. GPU-enabled code is generated for the most expensive contractions in CCSD(T), a key coupled cluster method, and incorporated into NW Chem, a popular computational chemistry suite. We demonstrate speedup over a factor of 8.4 using one core per node and over 2.6 when utilizing the entire system using hybrid CPU+GPU solution with 2 GPUs and 5 cores. Finally, we analyze the implementation behavior on future GPU systems.
Keywords :
computer graphic equipment; coprocessors; matrix multiplication; pattern clustering; program compilers; quantum chemistry; tensors; CPU; CUDA; GPGPU based cluster; GPU; code generation; data movement management; multidimensional matrix multiplication; quantum chemistry; streamed tensor contraction; Generators; Graphics processing unit; Indexes; Instruction sets; Kernel; Optimization; Tensile stress; GPGPU clusters; hybrid execution; tensor contractions;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Cluster Computing (CLUSTER), 2010 IEEE International Conference on
Conference_Location :
Heraklion, Crete
Print_ISBN :
978-1-4244-8373-0
Electronic_ISBN :
978-0-7695-4220-1
Type :
conf
DOI :
10.1109/CLUSTER.2010.26
Filename :
5600307
Link To Document :
بازگشت