• DocumentCode
    1997223
  • Title

    Use of SIMD Vector Operations to Accelerate Application Code Performance on Low-Powered ARM and Intel Platforms

  • Author

    Mitra, Gaurav ; Johnston, Benjamin ; Rendell, Alistair P. ; McCreath, Eric ; Jun Zhou

  • Author_Institution
    Res. Sch. of Comput. Sci., Australian Nat. Univ., Canberra, ACT, Australia
  • fYear
    2013
  • fDate
    20-24 May 2013
  • Firstpage
    1107
  • Lastpage
    1116
  • Abstract
    Augmenting a processor with special hardware that is able to apply a Single Instruction to Multiple Data(SIMD) at the same time is a cost effective way of improving processor performance. It also offers a means of improving the ratio of processor performance to power usage due to reduced and more effective data movement and intrinsically lower instruction counts. This paper considers and compares the NEON SIMD instruction set used on the ARM Cortex-A series of RISC processors with the SSE2 SIMD instruction set found on Intel platforms within the context of the Open Computer Vision (OpenCV) library. The performance obtained using compiler auto-vectorization is compared with that achieved using hand-tuning across a range of five different benchmarks and ten different hardware platforms. On the ARM platforms the hand-tuned NEON benchmarks were between 1.05× and 13.88× faster than the auto-vectorized code, while for the Intel platforms the hand-tuned SSE benchmarks were between 1.34× and 5.54× faster.
  • Keywords
    benchmark testing; microprocessor chips; parallel processing; performance evaluation; power aware computing; reduced instruction set computing; ARM Cortex-A series; NEON SIMD instruction set; OpenCV library; RISC processors; SIMD vector operations; SSE2 SIMD instruction set; application code performance acceleration; auto-vectorized code; compiler auto-vectorization; hand-tuned NEON benchmarks; hand-tuned SSE benchmarks; low-powered ARM platforms; low-powered Intel platforms; open computer vision library; processor performance; single instruction to multiple data; Assembly; Benchmark testing; Educational institutions; Graphics processing units; Image processing; Registers; Vectors; ARM; AVX; Low-Power; NEON; SIMD; SSE; Vectorization;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Parallel and Distributed Processing Symposium Workshops & PhD Forum (IPDPSW), 2013 IEEE 27th International
  • Conference_Location
    Cambridge, MA
  • Print_ISBN
    978-0-7695-4979-8
  • Type

    conf

  • DOI
    10.1109/IPDPSW.2013.207
  • Filename
    6650996