DocumentCode :
1799870
Title :
Accelerating Irregular Algorithms on GPGPUs Using Fine-Grain Hardware Worklists
Author :
Ji Yun Kim ; Batten, Christopher
Author_Institution :
Sch. of Electr. & Comput. Eng., Cornell Univ., Ithaca, NY, USA
fYear :
2014
fDate :
13-17 Dec. 2014
Firstpage :
75
Lastpage :
87
Abstract :
Although GPGPUs are traditionally used to accelerate workloads with regular control and memory-access structure, recent work has shown that GPGPUs can also achieve significant speedups on more irregular algorithms. Data-driven implementations of irregular algorithms are algorithmically more efficient than topology-driven implementations, but issues with memory contention and memory-access irregularity can make the former perform worse in certain cases. In this paper, we propose a novel fine-grain hardware work list for GPGPUs that addresses the weaknesses of data-driven implementations. We detail multiple work redistribution schemes of varying complexity that can be employed to improve load balancing. Furthermore, a virtualization mechanism supports seamless work spilling to memory. A convenient shared work list software API is provided to simplify using our proposed mechanisms when implementing irregular algorithms. We evaluate challenging irregular algorithms from the Lonestar GPU benchmark suite on a cycle-level simulator. Our findings show that data-driven implementations running on a GPGPU using the hardware work list outperform highly optimized software-based implementations of these benchmarks running on a baseline GPGPU with speedups ranging from 1.2 - 2.4× and marginal area overhead.
Keywords :
application program interfaces; benchmark testing; graphics processing units; multiprocessing systems; resource allocation; virtualisation; LonestarGPU benchmark suite; baseline GPGPU; cycle-level simulator; data-driven implementations; fine-grain hardware worklists; general-purpose graphics-processing units; hardware worklist; irregular algorithm acceleration; irregular algorithm implementation; load balancing improvement; marginal area overhead; memory contention; memory-access irregularity; shared worklist software API; virtualization mechanism; work redistribution schemes; work spilling; Benchmark testing; Hardware; Heuristic algorithms; Instruction sets; Kernel; Load management; Optimization;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Microarchitecture (MICRO), 2014 47th Annual IEEE/ACM International Symposium on
Conference_Location :
Cambridge
ISSN :
1072-4451
Type :
conf
DOI :
10.1109/MICRO.2014.24
Filename :
7011379
Link To Document :
بازگشت