Title :
CFDComm: An Optimized Library for Scalable Point-to-Point Communication for General CFD Applications
Author :
Haeri, S. ; Shrimpton, J.S.
Author_Institution :
Sch. of Eng. Sci., Univ. of Southampton, Southampton, UK
Abstract :
Domain decomposition is the most widely used technique to achieve parallelism in CFD applications. For complicated geometries usually graph partitioning programs are used to decompose the domain into smaller computational blocks such that the computation load is balanced and communication cost is minimized. In this paper an algorithm is provided and tested which avoids deadlocks in complicated communications patterns inherited from the graph decomposition process. The basic algorithm is implemented using FORTRAN 95 and MPI and then several optimization techniques are used to increase the scalability of the library which include addition of topologies, overlap of communication and computation to mask the message passing latency and non-blocking communication. The library is tested for up to 512 cores on the Iridis-3 cluster which incorporates 1008 compute nodes each composed of 2, 2.4 GHz 6-core Westmere processors. IO and inter-node communication is via a fast Infiniband network which is composed of groups of 32 nodes connected by DDR links to a 48 port QDR leaf-switch. The leaf switches then have 4 trunked QDR connections to 4 QDR 48-port core switches.
Keywords :
FORTRAN; computational fluid dynamics; message passing; multiprocessing systems; network theory (graphs); network topology; optimisation; software libraries; CFD; DDR link; FORTRAN 95; IO communication; Infiniband network; Iridis-3 cluster; MPI; QDR leaf switch; Westmere processor; domain decomposition; frequency 2 GHz; frequency 2.4 GHz; graph decomposition process; graph partitioning program; internode communication; library optimization; message passing latency; network topology; nonblocking communication; scalable point-to-point communication; Arrays; Computational fluid dynamics; Libraries; Optimization; Program processors; System recovery; Topology; Computational Fluid Dynamics; Domain Decomposition; MPI; Parallel Performance; Point-to-point Communication;
Conference_Titel :
High Performance Computing and Communication & 2012 IEEE 9th International Conference on Embedded Software and Systems (HPCC-ICESS), 2012 IEEE 14th International Conference on
Conference_Location :
Liverpool
Print_ISBN :
978-1-4673-2164-8
DOI :
10.1109/HPCC.2012.146