Title :
Balancing CPU-GPU Collaborative High-Order CFD Simulations on the Tianhe-1A Supercomputer
Author :
Chuanfu Xu ; Lilun Zhang ; Xiaogang Deng ; Jianbin Fang ; Guangxue Wang ; Wei Cao ; Yonggang Che ; Yongxian Wang ; Wei Liu
Author_Institution :
Coll. of Comput. Sci., Nat. Univ. of Defense Technol., Changsha, China
Abstract :
HOSTA is an in-house high-order CFD software that can simulate complex flows with complex geometries. Large scale high-order CFD simulations using HOSTA require massive HPC resources, thus motivating us to port it onto modern GPU accelerated supercomputers like Tianhe-1A. To achieve a greater speedup and fully tap the potential of Tianhe-1A, we collaborate CPU and GPU for HOSTA instead of using a naive GPU-only approach. We present multiple novel techniques to balance the loads between the store-poor GPU and the store-rich CPU, and overlap the collaborative computation and communication as far as possible. Taking CPU and GPU load balance into account, we improve the maximum simulation problem size per Tianhe-1A node for HOSTA by 2.3X, meanwhile the collaborative approach can improve the performance by around 45% compared to the GPU-only approach. Scalability tests show that HOSTA can achieve a parallel efficiency of above 60% on 1024 Tianhe-1A nodes. With our method, we have successfully simulated China´s large civil airplane configuration C919 containing 150M grid cells. To our best knowledge, this is the first paper that reports a CPUGPU collaborative high-order accurate aerodynamic simulation result with such a complex grid geometry.
Keywords :
computational fluid dynamics; flow simulation; graphics processing units; parallel machines; resource allocation; CPU-GPU collaborative high-order CFD simulations; CPU-GPU collaborative high-order accurate aerodynamic simulation; GPU accelerated supercomputers; HOSTA; Tianhe-1A supercomputer; complex grid geometry; in-house high-order CFD software; large scale high-order CFD simulations; load balancing; massive HPC resources; maximum simulation problem size per Tianhe-1A node; naive GPU-only approach; simulated China large civil airplane configuration C919; store-poor GPU; store-rich CPU; Collaboration; Computational fluid dynamics; Computational modeling; Graphics processing units; Kernel; Memory management; Performance evaluation; CFD; CPU-GPU collaboration; GPU parallelization; high-order finite difference scheme;
Conference_Titel :
Parallel and Distributed Processing Symposium, 2014 IEEE 28th International
Conference_Location :
Phoenix, AZ
Print_ISBN :
978-1-4799-3799-8
DOI :
10.1109/IPDPS.2014.80