DocumentCode
598593
Title
Scalable multi-GPU 3-D FFT for TSUBAME 2.0 Supercomputer
Author
Nukada, A. ; Sato, Kiminori ; Matsuoka, Shingo
Author_Institution
Tokyo Inst. of Technol., Tokyo, Japan
fYear
2012
fDate
10-16 Nov. 2012
Firstpage
1
Lastpage
10
Abstract
For scalable 3-D FFT computation using multiple GPUs, efficient all-to-all communication between GPUs is the most important factor in good performance. Implementations with point-to-point MPI library functions and CUDA memory copy APIs typically exhibit very large overheads especially for small message sizes in all-to-all communications between many nodes. We propose several schemes to minimize the overheads, including employment of lower-level API of InfiniBand to effectively overlap intra- and inter-node communication, as well as auto-tuning strategies to control scheduling and determine rail assignments. As a result we achieve very good strong scalability as well as good performance, up to 4.8TFLOPS using 256 nodes of TSUBAME 2.0 Supercomputer (768 GPUs) in double precision.
Keywords
application program interfaces; fast Fourier transforms; graphics processing units; mainframes; message passing; parallel architectures; parallel machines; 3D FFT computation scalability; 4.8TFLOPS; CUDA memory copy API; InfiniBand; TSUBAME 2.0 supercomputer; all-to-all communications; autotuning strategies; double precision; fast Fourier transform; internode communication; intranode communication; multiple GPU; point-to-point MPI library functions; Data transfer; Graphics processing units; Libraries; Peer to peer computing; Rails; Random access memory; Scalability;
fLanguage
English
Publisher
ieee
Conference_Titel
High Performance Computing, Networking, Storage and Analysis (SC), 2012 International Conference for
Conference_Location
Salt Lake City, UT
ISSN
2167-4329
Print_ISBN
978-1-4673-0805-2
Type
conf
DOI
10.1109/SC.2012.100
Filename
6468483
Link To Document