DocumentCode :
3181989
Title :
Energy-Efficient Acceleration of OpenCV Saliency Computation Using Soft Vector Processors
Author :
Hegde, Gopalakrishna ; Kapre, Nachiket
Author_Institution :
Sch. of Comput. Eng., Nanyang Technol. Univ., Singapore, Singapore
fYear :
2015
fDate :
2-6 May 2015
Firstpage :
76
Lastpage :
83
Abstract :
Soft vector processors in embedded FPGA platforms such as the Vector Blox MXP engine can match the performance and exceed the energy-efficiency of commercial off-the-shelf embedded SoCs with SIMD or GPU accelerators for OpenCV applications such as Saliency detection. We are also able to beat spatial hardware designs built from high-level synthesis while requiring significantly lower programming effort. These improvements are possible through careful scheduling of DMA operations to the vector engine, extensive use of line-buffering to enhance data reuse on the FPGA and limited use of scalar fallback for non-vectorizable code. The driving principle is to keep data and computation on the FPGA for as long as possible to exploit parallelism, data locality and lower the energy requirements of communication. Using our approach, we outperform all platforms in our architecture comparison while needing less energy. At640×480 image resolution, our implementation of MXP soft vector processor on the Xilinx Zed board exceeds the performance of the Jetson TK1-GPU by 1.5× while needing 1.6× less energy, Beagle bone Black by 4.7× at 2.3× less energy, Raspberry Piby 9× at 4× less energy, and Intel Galileo by 28× at 16× less energy. Our vector implementation also outperforms Vivado HLS generated OpenCV library implementation by 1.5×.
Keywords :
embedded systems; field programmable gate arrays; graphics processing units; multiprocessing systems; scheduling; system-on-chip; DMA operation scheduling; GPU accelerators; Jetson TK1-GPU; OpenCV saliency computation; Raspberry Pi; SIMD accelerators; Vector Blox MXP engine; Vivado HLS; Xilinx Zed board; embedded FPGA platforms; embedded SoC; energy-efficient acceleration; field programmable gate array; graphics processing unit; high-level synthesis; saliency detection; soft vector processors; system-on-chip; Engines; Field programmable gate arrays; Hardware; Optimization; Random access memory; System-on-chip; Vector processors;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Field-Programmable Custom Computing Machines (FCCM), 2015 IEEE 23rd Annual International Symposium on
Conference_Location :
Vancouver, BC
Type :
conf
DOI :
10.1109/FCCM.2015.39
Filename :
7160043
Link To Document :
بازگشت