• DocumentCode
    686370
  • Title

    CLSIFT: An Optimization Study of the Scale Invariance Feature Transform on GPUs

  • Author

    Weiyan Wang ; Yunquan Zhang ; Long Guoping ; Shengen Yan ; Haipeng Jia

  • Author_Institution
    Lab. of Parallel Software & Comput. Sci., Inst. of Software, Beijing, China
  • fYear
    2013
  • fDate
    13-15 Nov. 2013
  • Firstpage
    93
  • Lastpage
    100
  • Abstract
    Scale Invariance Feature Transform (SIFT) is quite suitable for image matching because of its invariance to image scaling, rotation and slight changes in illumination or viewpoint. However, due to high computation complexity it´s technically challenging to deploy SIFT in real time application situations. To address this problem, we propose CLSIFT, an OpenCL based highly speeded up and performance portable SIFT solution. Important optimization techniques employed in CLSIFT such as: (1) For less global memory traffic, independent logical functions are merged into the same kernel to reuse data. (2) loop buffers are introduced in for data and intermediate results reusing. (3)Task queue used to schedule threads in the same branch to remove branch divergences. (4) Data partition is based on the statics patterns for workload balance among workgroups. (5) Overlap of CPU time and better parallel strategies are used too. With all mentioned efforts, CLSIFT processes lena. jpg at 74.2 FPS and 43.4FPS respectively on NVidia and AMD GPUS, much higher than CPU´s nearly 10 FPS and the known fastest SIFTGPU´s 39.8 FPS and 13FPS. Moreover in a quantitative comparison approach we analyze those successful strategies beating SIFTGPU, a famous existing GPU implementation. Additionally, we observe and conclude that NVidia GPU achieves better occupancy and performance due to some factors. Finally, we summarize some techniques and empirical guiding principles that may be shared by other applications on GPU.
  • Keywords
    graphics processing units; image matching; optimisation; AMD GPUS; CLSIFT; CPU time; NVidia GPU; OpenCL; data partition; global memory traffic; image matching; image scaling; loop buffer; optimization study; scale invariance feature transform; statics pattern; task queue; Accuracy; Graphics processing units; Histograms; Instruction sets; Kernel; Memory management; Optimization; GPU; Memory Access; OpenCL; Parallel Strategies; SIFT; Workload;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    High Performance Computing and Communications & 2013 IEEE International Conference on Embedded and Ubiquitous Computing (HPCC_EUC), 2013 IEEE 10th International Conference on
  • Conference_Location
    Zhangjiajie
  • Type

    conf

  • DOI
    10.1109/HPCC.and.EUC.2013.23
  • Filename
    6825550