Title :
Understanding Performance Portability of OpenACC for Supercomputers
Author :
Suttinee Sawadsitang;James Lin;Simon See;Francois Bodin;Satoshi Matsuoka
Author_Institution :
Shanghai Jiao Tong Univ., Shanghai, China
fDate :
5/1/2015 12:00:00 AM
Abstract :
Scientific applications need to be moved among supercomputers, such as Tianhe-2 and TSUBAME 2.5. OpenACC provides a directive-based approach for a single source code base with function portability across different accelerators used in the supercomputers. However, the performance portability is not guaranteed by the OpenACC standard. Therefore, we propose a systematic optimization method, instead of auto-tuning by compliers, to achieve reasonable portable performance with minor code modifications. With this method, we evaluate the four kernels from Rodin a benchmark suite and one mini-application Hydro on our hybrid "CPU+GPU+MIC" supercomputer À with the CAPS and PGI compilers. We analyze Parallel Thread Execution (PTX) codes to further understand the performance portability, and find CAPS adopts a different strategy from PGI in thread distribution. The evaluation results show the optimized OpenACC versions can archive a better performance portability ratio than the OpenCL version in some cases. The understanding and the method are valuable for OpenACC application developers to efficiently and correctly use the available OpenACC compilers.
Keywords :
"Graphics processing units","Microwave integrated circuits","Optimization","Kernel","Supercomputers","Instruction sets","Standards"
Conference_Titel :
Parallel and Distributed Processing Symposium Workshop (IPDPSW), 2015 IEEE International
DOI :
10.1109/IPDPSW.2015.60