DocumentCode
1783208
Title
Energy Efficient HPC on Embedded SoCs: Optimization Techniques for Mali GPU
Author
Grasso, Ivan ; Radojkovic, Petar ; Rajovic, Nikola ; Gelado, Isaac ; Ramirez, Adrian
Author_Institution
Barcelona Supercomput. Center, Barcelona, Spain
fYear
2014
fDate
19-23 May 2014
Firstpage
123
Lastpage
132
Abstract
A lot of effort from academia and industry has been invested in exploring the suitability of low-power embedded technologies for HPC. Although state-of-the-art embedded systems-on-chip (SoCs) inherently contain GPUs that could be used for HPC, their performance and energy capabilities have never been evaluated. Two reasons contribute to the above. Primarily, embedded GPUs until now, have not supported 64-bit floating point arithmetic - a requirement for HPC. Secondly, embedded GPUs did not provide support for parallel programming languages such as OpenCL and CUDA. However, the situation is changing, and the latest GPUs integrated in embedded SoCs do support 64-bit floating point precision and parallel programming models. In this paper, we analyze performance and energy advantages of embedded GPUs for HPC. In particular, we analyze ARM Mali-T604 GPU - the first embedded GPUs with OpenCL Full Profile support. We identify, implement and evaluate software optimization techniques for efficient utilization of the ARM Mali GPU Compute Architecture. Our results show that, HPC benchmarks running on the ARM Mali-T604 GPU integrated into Exynos 5250 SoC, on average, achieve speed-up of 8.7X over a single Cortex-A15 core, while consuming only 32% of the energy. Overall results show that embedded GPUs have performance and energy qualities that make them candidates for future HPC systems.
Keywords
graphics processing units; optimisation; parallel architectures; power aware computing; system-on-chip; 64-bit floating point arithmetic; 64-bit floating point precision; ARM Mali GPU compute architecture; ARM Mali-T604 GPU; CUDA; Exynos 5250 SoC; HPC; OpenCL; OpenCL Full Profile support; embedded GPU; embedded SoC; embedded systems-on-chip; energy efficient HPC; optimization techniques; parallel programming languages; parallel programming models; single Cortex-A15 core; software optimization techniques; Benchmark testing; Computer architecture; Graphics processing units; Kernel; Optimization; System-on-chip; Vectors; Embedded GPUs; High performance computing; Optimization; Performance analysis;
fLanguage
English
Publisher
ieee
Conference_Titel
Parallel and Distributed Processing Symposium, 2014 IEEE 28th International
Conference_Location
Phoenix, AZ
ISSN
1530-2075
Print_ISBN
978-1-4799-3799-8
Type
conf
DOI
10.1109/IPDPS.2014.24
Filename
6877248
Link To Document