Title :
Optimizing and tuning the fast multipole method for state-of-the-art multicore architectures
Author :
Chandramowlishwaran, Aparna ; Williams, Samuel ; Oliker, Leonid ; Lashuk, Ilya ; Biros, George ; Vuduc, Richard
Author_Institution :
CRD, Lawrence Berkeley Nat. Lab., Berkeley, CA, USA
Abstract :
This work presents the first extensive study of single-node performance optimization, tuning, and analysis of the fast multipole method (FMM) on modern multi-core systems. We consider single- and double-precision with numerous performance enhancements, including low-level tuning, numerical approximation, data structure transformations, OpenMP parallelization, and algorithmic tuning. Among our numerous findings, we show that optimization and parallelization can improve double-precision performance by 25Ã on Intel´s quad-core Nehalem, 9.4Ã on AMD´s quad-core Barcelona, and 37.6Ã on Sun´s Victoria Falls (dual-sockets on all systems). We also compare our single-precision version against our prior state-of-the-art GPU-based code and show, surprisingly, that the most advanced multicore architecture (Nehalem) reaches parity in both performance and power efficiency with NVIDIA´s most advanced GPU architecture.
Keywords :
multiprocessing systems; optimisation; GPU architecture; Nehalem; OpenMP parallelization; algorithmic tuning; data structure transformations; double-precision performance; fast multipole method; low-level tuning; multicore architectures; numerical approximation; single-node performance optimization; Approximation algorithms; Computer architecture; Data structures; Educational institutions; Kernel; Laboratories; Multicore processing; Optimization methods; Performance analysis; Sun;
Conference_Titel :
Parallel & Distributed Processing (IPDPS), 2010 IEEE International Symposium on
Conference_Location :
Atlanta, GA
Print_ISBN :
978-1-4244-6442-5
DOI :
10.1109/IPDPS.2010.5470415