Title :
Scalable multi-GPU 3-D FFT for TSUBAME 2.0 Supercomputer
Author :
Nukada, A. ; Sato, Kiminori ; Matsuoka, Shingo
Author_Institution :
Tokyo Inst. of Technol., Tokyo, Japan
Abstract :
For scalable 3-D FFT computation using multiple GPUs, efficient all-to-all communication between GPUs is the most important factor in good performance. Implementations with point-to-point MPI library functions and CUDA memory copy APIs typically exhibit very large overheads especially for small message sizes in all-to-all communications between many nodes. We propose several schemes to minimize the overheads, including employment of lower-level API of InfiniBand to effectively overlap intra- and inter-node communication, as well as auto-tuning strategies to control scheduling and determine rail assignments. As a result we achieve very good strong scalability as well as good performance, up to 4.8TFLOPS using 256 nodes of TSUBAME 2.0 Supercomputer (768 GPUs) in double precision.
Keywords :
application program interfaces; fast Fourier transforms; graphics processing units; mainframes; message passing; parallel architectures; parallel machines; 3D FFT computation scalability; 4.8TFLOPS; CUDA memory copy API; InfiniBand; TSUBAME 2.0 supercomputer; all-to-all communications; autotuning strategies; double precision; fast Fourier transform; internode communication; intranode communication; multiple GPU; point-to-point MPI library functions; Data transfer; Graphics processing units; Libraries; Peer to peer computing; Rails; Random access memory; Scalability;
Conference_Titel :
High Performance Computing, Networking, Storage and Analysis (SC), 2012 International Conference for
Conference_Location :
Salt Lake City, UT
Print_ISBN :
978-1-4673-0805-2
DOI :
10.1109/SC.2012.100