Fast Convolution Operations on Many-Core Architectures

Author

Shigang Li;Yunquan Zhang;Chunyang Xiang;Lei Shi

Author_Institution

State Key Lab. of Comput. Archit., Inst. of Comput. Technol., Beijing, China

fYear

2015

Firstpage

316

Lastpage

323

Abstract

Convolution operations have been widely used in many important application domains, such as deep learning and computer vision, in which convolution is always the most time-consuming part. High computational throughput and memory bandwidth make many-core architectures the promising targets to accelerate these applications. In this paper, we implement and optimize different convolution operations, including 1D convolution, 2D convolution and multi-channel 2D convolution executed in mini-batch mode, on both GPU and Intel MIC many-core architectures. We find out that the performance bottleneck of 1D and 2D convolutions is on registers rather than local memory or L1/L2 cache, and therefore, register tiling is used to improve the performance. In addition, we present a novel solution for multi-channel 2D convolution, in which convolution is conducted on images directly instead of being translated to matrix multiplication, and the data reuse of the algorithm is fully exploited. We further summarize the parameters of autotuning for multichannel 2D convolution and prune the search space based on heuristics. The experimental results show that, for the large filter size, our solution gets up to 33% performance improvement over cuDNN-v2 and up to 28% over clBLASbased implementation, on GTX TITAN and AMD W8000 respectively. On Intel MIC, our solution gets up to 25% of the theoretical peak performance.

Keywords

"Convolution","Instruction sets","Registers","Computer architecture","Microwave integrated circuits","Filtering algorithms","Neural networks"

Publisher

ieee

Conference_Titel

High Performance Computing and Communications (HPCC), 2015 IEEE 7th International Symposium on Cyberspace Safety and Security (CSS), 2015 IEEE 12th International Conferen on Embedded Software and Systems (ICESS), 2015 IEEE 17th International Conference on

Type

conf

DOI

10.1109/HPCC-CSS-ICESS.2015.94

Filename

7336182