DocumentCode :
2235155
Title :
Parallel Matrix-Matrix Multiplication Based on HPL with a GPU-Accelerated PC Cluster
Author :
Wang, Qin ; Ohmura, Junichi ; Axida, Shan ; Miyoshi, Takefumi ; Irie, Hidetsugu ; Yoshinaga, Tsutomu
Author_Institution :
Dept. of Inf. Network Syst., Univ. of Electro-Commun., Chofu, Japan
fYear :
2010
fDate :
17-19 Nov. 2010
Firstpage :
243
Lastpage :
248
Abstract :
In this paper, we propose an approach for significantly improving the performance of parallel matrix-matrix multiplication using a GPU-accelerated cluster. For one node, we implement a CPUs-GPU parallel double-precision general matrix-matrix multiplication (dgemm) operation and achieve a performance improvement of 32% as compared to the GPU-only case and 56% as compared to the CPUs-only case. For the entire cluster, we use the overlap GPU acceleration solution to high-performance Linpack (HPL), which eliminates the close dependency between the LU decomposition and the dgemm operation, and achieve a performance improvement of 5.72% as compared to the flat GPU acceleration case.
Keywords :
computer graphic equipment; coprocessors; matrix multiplication; parallel programming; CPU; GPU accelerated PC cluster; HPL; LU decomposition; dgemm operation; high-performance Linpack; parallel matrix-matrix multiplication; performance improvement; GPU; MPI; cluster; heterogeneous; matrix-multiplier; parallelization;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Networking and Computing (ICNC), 2010 First International Conference on
Conference_Location :
Higashi-Hiroshima
Print_ISBN :
978-1-4244-8918-3
Electronic_ISBN :
978-0-7695-4277-5
Type :
conf
DOI :
10.1109/IC-NC.2010.39
Filename :
5695242
Link To Document :
بازگشت