Design and implementation of a parallel priority queue on many-core architectures

Author

Xi He ; Agarwal, Deborah ; Prasad, Sushil K.

Author_Institution

Dept. of Comput. Sci., Georgia State Univ., Atlanta, GA, USA

fYear

2012

fDate

18-22 Dec. 2012

Firstpage

1

Lastpage

10

Abstract

An efficient parallel priority queue is at the core of the effort in parallelizing important non-numeric irregular computations such as discrete event simulation scheduling and branch-and-bound algorithms. GPGPUs can provide powerful computing platform for such non-numeric computations if an efficient parallel priority queue implementation is available. In this paper, aiming at fine-grained applications, we develop an efficient parallel heap system employing CUDA. To our knowledge, this is the first parallel priority queue implementation on many-core architectures, thus represents a breakthrough. By allowing wide heap nodes to enable thousands of simultaneous deletions of highest priority items and insertions of new items, and taking full advantage of CUDA´s data parallel SIMT architecture, we demonstrate up to 30-fold absolute speedup for relatively fine-grained compute loads compared to optimized sequential priority queue implementation on fast multicores. Compared to this, our optimized multicore parallelization of parallel heap yields only 2-3 fold speedup for such fine-grained loads. This parallelization of a tree-based data structure on GPGPUs provides a roadmap for future parallelizations of other such data structures.

Keywords

data structures; graphics processing units; multiprocessing systems; parallel architectures; queueing theory; CUDA; GPGPU; branch-and-bound algorithms; data parallel SIMT architecture; data structures; discrete event simulation scheduling; fast multicores; fine-grained applications; fine-grained compute loads; fine-grained loads; many-core architectures; nonnumeric computations; nonnumeric irregular computations; optimized multicore parallelization; optimized sequential priority queue implementation; parallel heap system; parallel priority queue design; parallel priority queue implementation; tree-based data structure; wide heap nodes;

fLanguage

English

Publisher

ieee

Conference_Titel

High Performance Computing (HiPC), 2012 19th International Conference on

Conference_Location

Pune

Print_ISBN

978-1-4673-2372-7

Electronic_ISBN

978-1-4673-2370-3

Type

conf

DOI

10.1109/HiPC.2012.6507490

Filename

6507490