DocumentCode :
1783208
Title :
Energy Efficient HPC on Embedded SoCs: Optimization Techniques for Mali GPU
Author :
Grasso, Ivan ; Radojkovic, Petar ; Rajovic, Nikola ; Gelado, Isaac ; Ramirez, Adrian
Author_Institution :
Barcelona Supercomput. Center, Barcelona, Spain
fYear :
2014
fDate :
19-23 May 2014
Firstpage :
123
Lastpage :
132
Abstract :
A lot of effort from academia and industry has been invested in exploring the suitability of low-power embedded technologies for HPC. Although state-of-the-art embedded systems-on-chip (SoCs) inherently contain GPUs that could be used for HPC, their performance and energy capabilities have never been evaluated. Two reasons contribute to the above. Primarily, embedded GPUs until now, have not supported 64-bit floating point arithmetic - a requirement for HPC. Secondly, embedded GPUs did not provide support for parallel programming languages such as OpenCL and CUDA. However, the situation is changing, and the latest GPUs integrated in embedded SoCs do support 64-bit floating point precision and parallel programming models. In this paper, we analyze performance and energy advantages of embedded GPUs for HPC. In particular, we analyze ARM Mali-T604 GPU - the first embedded GPUs with OpenCL Full Profile support. We identify, implement and evaluate software optimization techniques for efficient utilization of the ARM Mali GPU Compute Architecture. Our results show that, HPC benchmarks running on the ARM Mali-T604 GPU integrated into Exynos 5250 SoC, on average, achieve speed-up of 8.7X over a single Cortex-A15 core, while consuming only 32% of the energy. Overall results show that embedded GPUs have performance and energy qualities that make them candidates for future HPC systems.
Keywords :
graphics processing units; optimisation; parallel architectures; power aware computing; system-on-chip; 64-bit floating point arithmetic; 64-bit floating point precision; ARM Mali GPU compute architecture; ARM Mali-T604 GPU; CUDA; Exynos 5250 SoC; HPC; OpenCL; OpenCL Full Profile support; embedded GPU; embedded SoC; embedded systems-on-chip; energy efficient HPC; optimization techniques; parallel programming languages; parallel programming models; single Cortex-A15 core; software optimization techniques; Benchmark testing; Computer architecture; Graphics processing units; Kernel; Optimization; System-on-chip; Vectors; Embedded GPUs; High performance computing; Optimization; Performance analysis;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Parallel and Distributed Processing Symposium, 2014 IEEE 28th International
Conference_Location :
Phoenix, AZ
ISSN :
1530-2075
Print_ISBN :
978-1-4799-3799-8
Type :
conf
DOI :
10.1109/IPDPS.2014.24
Filename :
6877248
Link To Document :
بازگشت