• DocumentCode
    3696988
  • Title

    Fast Convolution Operations on Many-Core Architectures

  • Author

    Shigang Li;Yunquan Zhang;Chunyang Xiang;Lei Shi

  • Author_Institution
    State Key Lab. of Comput. Archit., Inst. of Comput. Technol., Beijing, China
  • fYear
    2015
  • Firstpage
    316
  • Lastpage
    323
  • Abstract
    Convolution operations have been widely used in many important application domains, such as deep learning and computer vision, in which convolution is always the most time-consuming part. High computational throughput and memory bandwidth make many-core architectures the promising targets to accelerate these applications. In this paper, we implement and optimize different convolution operations, including 1D convolution, 2D convolution and multi-channel 2D convolution executed in mini-batch mode, on both GPU and Intel MIC many-core architectures. We find out that the performance bottleneck of 1D and 2D convolutions is on registers rather than local memory or L1/L2 cache, and therefore, register tiling is used to improve the performance. In addition, we present a novel solution for multi-channel 2D convolution, in which convolution is conducted on images directly instead of being translated to matrix multiplication, and the data reuse of the algorithm is fully exploited. We further summarize the parameters of autotuning for multichannel 2D convolution and prune the search space based on heuristics. The experimental results show that, for the large filter size, our solution gets up to 33% performance improvement over cuDNN-v2 and up to 28% over clBLASbased implementation, on GTX TITAN and AMD W8000 respectively. On Intel MIC, our solution gets up to 25% of the theoretical peak performance.
  • Keywords
    "Convolution","Instruction sets","Registers","Computer architecture","Microwave integrated circuits","Filtering algorithms","Neural networks"
  • Publisher
    ieee
  • Conference_Titel
    High Performance Computing and Communications (HPCC), 2015 IEEE 7th International Symposium on Cyberspace Safety and Security (CSS), 2015 IEEE 12th International Conferen on Embedded Software and Systems (ICESS), 2015 IEEE 17th International Conference on
  • Type

    conf

  • DOI
    10.1109/HPCC-CSS-ICESS.2015.94
  • Filename
    7336182