• DocumentCode
    1783208
  • Title

    Energy Efficient HPC on Embedded SoCs: Optimization Techniques for Mali GPU

  • Author

    Grasso, Ivan ; Radojkovic, Petar ; Rajovic, Nikola ; Gelado, Isaac ; Ramirez, Adrian

  • Author_Institution
    Barcelona Supercomput. Center, Barcelona, Spain
  • fYear
    2014
  • fDate
    19-23 May 2014
  • Firstpage
    123
  • Lastpage
    132
  • Abstract
    A lot of effort from academia and industry has been invested in exploring the suitability of low-power embedded technologies for HPC. Although state-of-the-art embedded systems-on-chip (SoCs) inherently contain GPUs that could be used for HPC, their performance and energy capabilities have never been evaluated. Two reasons contribute to the above. Primarily, embedded GPUs until now, have not supported 64-bit floating point arithmetic - a requirement for HPC. Secondly, embedded GPUs did not provide support for parallel programming languages such as OpenCL and CUDA. However, the situation is changing, and the latest GPUs integrated in embedded SoCs do support 64-bit floating point precision and parallel programming models. In this paper, we analyze performance and energy advantages of embedded GPUs for HPC. In particular, we analyze ARM Mali-T604 GPU - the first embedded GPUs with OpenCL Full Profile support. We identify, implement and evaluate software optimization techniques for efficient utilization of the ARM Mali GPU Compute Architecture. Our results show that, HPC benchmarks running on the ARM Mali-T604 GPU integrated into Exynos 5250 SoC, on average, achieve speed-up of 8.7X over a single Cortex-A15 core, while consuming only 32% of the energy. Overall results show that embedded GPUs have performance and energy qualities that make them candidates for future HPC systems.
  • Keywords
    graphics processing units; optimisation; parallel architectures; power aware computing; system-on-chip; 64-bit floating point arithmetic; 64-bit floating point precision; ARM Mali GPU compute architecture; ARM Mali-T604 GPU; CUDA; Exynos 5250 SoC; HPC; OpenCL; OpenCL Full Profile support; embedded GPU; embedded SoC; embedded systems-on-chip; energy efficient HPC; optimization techniques; parallel programming languages; parallel programming models; single Cortex-A15 core; software optimization techniques; Benchmark testing; Computer architecture; Graphics processing units; Kernel; Optimization; System-on-chip; Vectors; Embedded GPUs; High performance computing; Optimization; Performance analysis;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Parallel and Distributed Processing Symposium, 2014 IEEE 28th International
  • Conference_Location
    Phoenix, AZ
  • ISSN
    1530-2075
  • Print_ISBN
    978-1-4799-3799-8
  • Type

    conf

  • DOI
    10.1109/IPDPS.2014.24
  • Filename
    6877248