DocumentCode :
167470
Title :
CFD Builder: A Library Builder for Computational Fluid Dynamics
Author :
Jayaraj, Jagan ; Pei-Hung Lin ; Woodward, Paul R. ; Pen-Chung Yew
Author_Institution :
Sandia Nat. Labs. Albuquerque, Albuquerque, NM, USA
fYear :
2014
fDate :
19-23 May 2014
Firstpage :
1029
Lastpage :
1038
Abstract :
Computational Fluid Dynamics is an important area in scientific computing. The weak scaling of codes is well understood with about two decades of experiences using MPI. As a result, per-node performance has become very crucial to the overall machine performance. However, despite the use of multi-threading, obtaining good performance at each core is still extremely challenging. The challenges are primarily due to memory bandwidth limitations and difficulties in using short SIMD engines effectively. This work is about the techniques and a tool to improve in-core performance. Fundamental to the strategy is a hierarchical data layout made of small cubical structures of the problem states that can fit well in the cache hierarchy. The difficulties in computing the spatial derivatives (also called near neighbor computation in the literature) in a hierarchical data layout are well known, hence, such a data layout has rarely been used in finite difference codes. This work discusses how to program relatively easily for such a hierarchical data layout, the inefficiencies in this programming strategy, and how to overcome the inefficiencies. The key technique to eliminate the overheads is called pipeline-for-reuse. It is followed by a storage optimization called maximal array contraction. Both pipeline-for-reuse and maximal array contraction are highly tedious and error-prone. Therefore, we built a source-to-source translator called CFD Builder to automate the transformations using directives. The directive based approach leverages domain experts´ knowledge about the code, and eliminates the need for complex analysis before program transformations. We demonstrated the effectiveness of this approach using three different applications on two different architectures and two different compilers. We see up to 6.92 performance improvement using such an approach. We believe such an approach could enable library and application writers to build efficient CFD libraries.
Keywords :
cache storage; computational fluid dynamics; finite difference methods; multi-threading; optimisation; parallel processing; program compilers; software architecture; CFD builder; MPI; architectures; cache hierarchy; compilers; computational fluid dynamics; finite difference codes; hierarchical data layout; library builder; maximal array contraction; multi-threading; per-node performance; pipeline-for-reuse; scientific computing; short SIMD engines; storage optimization; Arrays; Computational fluid dynamics; Instruction sets; Layout; Pipeline processing; Programming; CFD; briquette; hierarchical data layout; high performance; source-to-source;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Parallel & Distributed Processing Symposium Workshops (IPDPSW), 2014 IEEE International
Conference_Location :
Phoenix, AZ
Print_ISBN :
978-1-4799-4117-9
Type :
conf
DOI :
10.1109/IPDPSW.2014.117
Filename :
6969494
Link To Document :
بازگشت