• DocumentCode
    167470
  • Title

    CFD Builder: A Library Builder for Computational Fluid Dynamics

  • Author

    Jayaraj, Jagan ; Pei-Hung Lin ; Woodward, Paul R. ; Pen-Chung Yew

  • Author_Institution
    Sandia Nat. Labs. Albuquerque, Albuquerque, NM, USA
  • fYear
    2014
  • fDate
    19-23 May 2014
  • Firstpage
    1029
  • Lastpage
    1038
  • Abstract
    Computational Fluid Dynamics is an important area in scientific computing. The weak scaling of codes is well understood with about two decades of experiences using MPI. As a result, per-node performance has become very crucial to the overall machine performance. However, despite the use of multi-threading, obtaining good performance at each core is still extremely challenging. The challenges are primarily due to memory bandwidth limitations and difficulties in using short SIMD engines effectively. This work is about the techniques and a tool to improve in-core performance. Fundamental to the strategy is a hierarchical data layout made of small cubical structures of the problem states that can fit well in the cache hierarchy. The difficulties in computing the spatial derivatives (also called near neighbor computation in the literature) in a hierarchical data layout are well known, hence, such a data layout has rarely been used in finite difference codes. This work discusses how to program relatively easily for such a hierarchical data layout, the inefficiencies in this programming strategy, and how to overcome the inefficiencies. The key technique to eliminate the overheads is called pipeline-for-reuse. It is followed by a storage optimization called maximal array contraction. Both pipeline-for-reuse and maximal array contraction are highly tedious and error-prone. Therefore, we built a source-to-source translator called CFD Builder to automate the transformations using directives. The directive based approach leverages domain experts´ knowledge about the code, and eliminates the need for complex analysis before program transformations. We demonstrated the effectiveness of this approach using three different applications on two different architectures and two different compilers. We see up to 6.92 performance improvement using such an approach. We believe such an approach could enable library and application writers to build efficient CFD libraries.
  • Keywords
    cache storage; computational fluid dynamics; finite difference methods; multi-threading; optimisation; parallel processing; program compilers; software architecture; CFD builder; MPI; architectures; cache hierarchy; compilers; computational fluid dynamics; finite difference codes; hierarchical data layout; library builder; maximal array contraction; multi-threading; per-node performance; pipeline-for-reuse; scientific computing; short SIMD engines; storage optimization; Arrays; Computational fluid dynamics; Instruction sets; Layout; Pipeline processing; Programming; CFD; briquette; hierarchical data layout; high performance; source-to-source;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Parallel & Distributed Processing Symposium Workshops (IPDPSW), 2014 IEEE International
  • Conference_Location
    Phoenix, AZ
  • Print_ISBN
    978-1-4799-4117-9
  • Type

    conf

  • DOI
    10.1109/IPDPSW.2014.117
  • Filename
    6969494