• DocumentCode
    1799870
  • Title

    Accelerating Irregular Algorithms on GPGPUs Using Fine-Grain Hardware Worklists

  • Author

    Ji Yun Kim ; Batten, Christopher

  • Author_Institution
    Sch. of Electr. & Comput. Eng., Cornell Univ., Ithaca, NY, USA
  • fYear
    2014
  • fDate
    13-17 Dec. 2014
  • Firstpage
    75
  • Lastpage
    87
  • Abstract
    Although GPGPUs are traditionally used to accelerate workloads with regular control and memory-access structure, recent work has shown that GPGPUs can also achieve significant speedups on more irregular algorithms. Data-driven implementations of irregular algorithms are algorithmically more efficient than topology-driven implementations, but issues with memory contention and memory-access irregularity can make the former perform worse in certain cases. In this paper, we propose a novel fine-grain hardware work list for GPGPUs that addresses the weaknesses of data-driven implementations. We detail multiple work redistribution schemes of varying complexity that can be employed to improve load balancing. Furthermore, a virtualization mechanism supports seamless work spilling to memory. A convenient shared work list software API is provided to simplify using our proposed mechanisms when implementing irregular algorithms. We evaluate challenging irregular algorithms from the Lonestar GPU benchmark suite on a cycle-level simulator. Our findings show that data-driven implementations running on a GPGPU using the hardware work list outperform highly optimized software-based implementations of these benchmarks running on a baseline GPGPU with speedups ranging from 1.2 - 2.4× and marginal area overhead.
  • Keywords
    application program interfaces; benchmark testing; graphics processing units; multiprocessing systems; resource allocation; virtualisation; LonestarGPU benchmark suite; baseline GPGPU; cycle-level simulator; data-driven implementations; fine-grain hardware worklists; general-purpose graphics-processing units; hardware worklist; irregular algorithm acceleration; irregular algorithm implementation; load balancing improvement; marginal area overhead; memory contention; memory-access irregularity; shared worklist software API; virtualization mechanism; work redistribution schemes; work spilling; Benchmark testing; Hardware; Heuristic algorithms; Instruction sets; Kernel; Load management; Optimization;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Microarchitecture (MICRO), 2014 47th Annual IEEE/ACM International Symposium on
  • Conference_Location
    Cambridge
  • ISSN
    1072-4451
  • Type

    conf

  • DOI
    10.1109/MICRO.2014.24
  • Filename
    7011379