مرکز منطقه ای اطلاع رساني علوم و فناوري - Accelerating Irregular Algorithms on GPGPUs Using Fine-Grain Hardware Worklists

DocumentCode :

1799870

Title :

Accelerating Irregular Algorithms on GPGPUs Using Fine-Grain Hardware Worklists

Author :

Ji Yun Kim ; Batten, Christopher

Author_Institution :

Sch. of Electr. & Comput. Eng., Cornell Univ., Ithaca, NY, USA

fYear :

2014

fDate :

13-17 Dec. 2014

Firstpage :

Lastpage :

Abstract :

Although GPGPUs are traditionally used to accelerate workloads with regular control and memory-access structure, recent work has shown that GPGPUs can also achieve significant speedups on more irregular algorithms. Data-driven implementations of irregular algorithms are algorithmically more efficient than topology-driven implementations, but issues with memory contention and memory-access irregularity can make the former perform worse in certain cases. In this paper, we propose a novel fine-grain hardware work list for GPGPUs that addresses the weaknesses of data-driven implementations. We detail multiple work redistribution schemes of varying complexity that can be employed to improve load balancing. Furthermore, a virtualization mechanism supports seamless work spilling to memory. A convenient shared work list software API is provided to simplify using our proposed mechanisms when implementing irregular algorithms. We evaluate challenging irregular algorithms from the Lonestar GPU benchmark suite on a cycle-level simulator. Our findings show that data-driven implementations running on a GPGPU using the hardware work list outperform highly optimized software-based implementations of these benchmarks running on a baseline GPGPU with speedups ranging from 1.2 - 2.4× and marginal area overhead.

Keywords :

application program interfaces; benchmark testing; graphics processing units; multiprocessing systems; resource allocation; virtualisation; LonestarGPU benchmark suite; baseline GPGPU; cycle-level simulator; data-driven implementations; fine-grain hardware worklists; general-purpose graphics-processing units; hardware worklist; irregular algorithm acceleration; irregular algorithm implementation; load balancing improvement; marginal area overhead; memory contention; memory-access irregularity; shared worklist software API; virtualization mechanism; work redistribution schemes; work spilling; Benchmark testing; Hardware; Heuristic algorithms; Instruction sets; Kernel; Load management; Optimization;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Microarchitecture (MICRO), 2014 47th Annual IEEE/ACM International Symposium on

Conference_Location :

Cambridge

ISSN :

1072-4451

Type :

conf

DOI :

10.1109/MICRO.2014.24

Filename :

7011379

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=1799870