Balancing CPU-GPU Collaborative High-Order CFD Simulations on the Tianhe-1A Supercomputer

Author

Chuanfu Xu ; Lilun Zhang ; Xiaogang Deng ; Jianbin Fang ; Guangxue Wang ; Wei Cao ; Yonggang Che ; Yongxian Wang ; Wei Liu

Author_Institution

Coll. of Comput. Sci., Nat. Univ. of Defense Technol., Changsha, China

fYear

2014

fDate

19-23 May 2014

Firstpage

725

Lastpage

734

Abstract

HOSTA is an in-house high-order CFD software that can simulate complex flows with complex geometries. Large scale high-order CFD simulations using HOSTA require massive HPC resources, thus motivating us to port it onto modern GPU accelerated supercomputers like Tianhe-1A. To achieve a greater speedup and fully tap the potential of Tianhe-1A, we collaborate CPU and GPU for HOSTA instead of using a naive GPU-only approach. We present multiple novel techniques to balance the loads between the store-poor GPU and the store-rich CPU, and overlap the collaborative computation and communication as far as possible. Taking CPU and GPU load balance into account, we improve the maximum simulation problem size per Tianhe-1A node for HOSTA by 2.3X, meanwhile the collaborative approach can improve the performance by around 45% compared to the GPU-only approach. Scalability tests show that HOSTA can achieve a parallel efficiency of above 60% on 1024 Tianhe-1A nodes. With our method, we have successfully simulated China´s large civil airplane configuration C919 containing 150M grid cells. To our best knowledge, this is the first paper that reports a CPUGPU collaborative high-order accurate aerodynamic simulation result with such a complex grid geometry.

Keywords

computational fluid dynamics; flow simulation; graphics processing units; parallel machines; resource allocation; CPU-GPU collaborative high-order CFD simulations; CPU-GPU collaborative high-order accurate aerodynamic simulation; GPU accelerated supercomputers; HOSTA; Tianhe-1A supercomputer; complex grid geometry; in-house high-order CFD software; large scale high-order CFD simulations; load balancing; massive HPC resources; maximum simulation problem size per Tianhe-1A node; naive GPU-only approach; simulated China large civil airplane configuration C919; store-poor GPU; store-rich CPU; Collaboration; Computational fluid dynamics; Computational modeling; Graphics processing units; Kernel; Memory management; Performance evaluation; CFD; CPU-GPU collaboration; GPU parallelization; high-order finite difference scheme;

fLanguage

English

Publisher

ieee

Conference_Titel

Parallel and Distributed Processing Symposium, 2014 IEEE 28th International

Conference_Location

Phoenix, AZ

ISSN

1530-2075

Print_ISBN

978-1-4799-3799-8

Type

conf

DOI

10.1109/IPDPS.2014.80

Filename

6877304