DocumentCode :
3574905
Title :
Enabling PGAS Productivity with Hardware Support for Shared Address Mapping: A UPC Case Study
Author :
Serres, Olivier ; Kayi, Abdullah ; Anbar, Ahmad ; El Ghazawi, Tarek
Author_Institution :
George Washington Univ., Washington, DC, USA
fYear :
2014
Firstpage :
1
Lastpage :
10
Abstract :
The Partitioned Global Address Space (PGAS) programming model strikes a balance between the locality-aware, but explicit, message-passing model (e.g. MPI) and the easy-to-use, but locality-agnostic, shared memory model (e.g. OpenMP). However, the PGAS rich memory model comes at a performance cost which can hinder its potential for scalability and performance. To contain this overhead and achieve full performance, compiler optimizations may not be sufficient and manual optimizations are typically added. This, however, can severely limit the productivity advantage. Such optimizations are usually targeted at reducing address translation overheads for shared data structures. This paper proposes a hardware architectural support for PGAS, which allows the processor to efficiently handle shared addresses. This eliminates the need for such hand-tuning, while maintaining the performance and productivity of PGAS languages. We propose to avail this hardware support to compilers by introducing new instructions to efficiently access and traverse the PGAS memory space. A prototype compiler is realized by extending the Berkeley Unified Parallel C (UPC) compiler. It allows unmodified code to use the new instructions without the user intervention, thereby creating a real productive programming environment. Two different implementations of the system are realized: the first is implemented using the full system simulator Gem5, which allows the evaluation of the performance gain. The second is implemented using a soft core processor Leon3 on an FPGA to verify the implement ability and to parameterize the cost of the new hardware and its instructions. The new instructions show promising results for the NAS Parallel Benchmarks implemented in UPC. A speedup of up to 5.5x is demonstrated for unmodified codes. Unmodified code performance using this hardware was shown to also surpass the performance of manually optimized code by up to 10%.
Keywords :
message passing; optimising compilers; shared memory systems; Berkeley Unified Parallel C compiler; FPGA; Gem5; NAS Parallel Benchmarks; PGAS languages; PGAS memory space; PGAS productivity; PGAS programming model; PGAS rich memory model; UPC compiler; compiler optimizations; hand-tuning; hardware architectural support; hardware support; locality-agnostic model; locality-aware model; message-passing model; partitioned global address space; productivity advantage; prototype compiler; real productive programming environment; shared address mapping; shared addresses; shared data structures; shared memory model; soft core processor Leon3; unmodified code performance; Arrays; Electronics packaging; Hardware; Instruction sets; Optimization; Productivity; Registers; High Performance Computing; Parallel architectures; Parallel programming;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
High Performance Computing and Communications, 2014 IEEE 6th Intl Symp on Cyberspace Safety and Security, 2014 IEEE 11th Intl Conf on Embedded Software and Syst (HPCC,CSS,ICESS), 2014 IEEE Intl Conf on
Print_ISBN :
978-1-4799-6122-1
Type :
conf
DOI :
10.1109/HPCC.2014.8
Filename :
7056590
Link To Document :
بازگشت