مرکز منطقه ای اطلاع رساني علوم و فناوري - Parallel Algebraic Multigrid Solvers for PDEs from Biomedical Applications

چكيده فارسي :

Cardiovascular simulations require the solution of coupled PDE/ODE equations with several internal couplings of the PDEs depending on the underlying model and the available compute capabilities. The MPI/OpenMP and GPU (CUDA) parallelization of the ODE solver and the elliptic/parabolic potential problem solver has been successfully performed in the past. We use a cg-iteration with an algebraic multigrid preconditioner (AMG) for the elliptic problem and its general parallelization will be presented in the talk. This parallelization concept has been revisited with respect to load balancing of the subdomain interfaces of the decomposed domain and resulted in a much better strong parallel efficiency, especially on clusters of GPUs. The matrices for the potential problems remain unchanged during the whole calculation, i.e., the matrices are computed and assembled on the CPU and transferred only once to the GPU. The same holds for the AMG setup. In order to provide a GPU solver for elasticity we extended the AMG for coupled problems. Here, several versions have been investigated and the AMG with coupled degrees of freedom in each node together with a graph coarsening proved the best robustness and the best timings. The relation between costs for setup and solver changes completely in case of non-linear elasticity. The original CPU code spent 50The assumption that the matrix graph won’t change during the non-linear calculation supports the GPU acceleration of this step (and also the CPU acceleration). This assembling of the local contributions into the global stiffness matrix on the GPU is subject to ongoing work. Additionally, the deformed geometry requires a mesh smoothing which will be provided on basis of radial basis functions (RBF). A similar assumption will be used for the AMG setup on the CPU/GPU resulting in several setup entry points with very different computational costs and data transfers between CPU and GPU. Besides a setup phase, the full non-linear iteration algorithm will run completely on the GPU. Taking also into account the dramatically reduced data transfer between host and device we expect an acceleration of the non-linear iteration by a factor of 30 with respect to one CPU core.