• DocumentCode
    2037399
  • Title

    Design and implementation of a parallel priority queue on many-core architectures

  • Author

    Xi He ; Agarwal, Deborah ; Prasad, Sushil K.

  • Author_Institution
    Dept. of Comput. Sci., Georgia State Univ., Atlanta, GA, USA
  • fYear
    2012
  • fDate
    18-22 Dec. 2012
  • Firstpage
    1
  • Lastpage
    10
  • Abstract
    An efficient parallel priority queue is at the core of the effort in parallelizing important non-numeric irregular computations such as discrete event simulation scheduling and branch-and-bound algorithms. GPGPUs can provide powerful computing platform for such non-numeric computations if an efficient parallel priority queue implementation is available. In this paper, aiming at fine-grained applications, we develop an efficient parallel heap system employing CUDA. To our knowledge, this is the first parallel priority queue implementation on many-core architectures, thus represents a breakthrough. By allowing wide heap nodes to enable thousands of simultaneous deletions of highest priority items and insertions of new items, and taking full advantage of CUDA´s data parallel SIMT architecture, we demonstrate up to 30-fold absolute speedup for relatively fine-grained compute loads compared to optimized sequential priority queue implementation on fast multicores. Compared to this, our optimized multicore parallelization of parallel heap yields only 2-3 fold speedup for such fine-grained loads. This parallelization of a tree-based data structure on GPGPUs provides a roadmap for future parallelizations of other such data structures.
  • Keywords
    data structures; graphics processing units; multiprocessing systems; parallel architectures; queueing theory; CUDA; GPGPU; branch-and-bound algorithms; data parallel SIMT architecture; data structures; discrete event simulation scheduling; fast multicores; fine-grained applications; fine-grained compute loads; fine-grained loads; many-core architectures; nonnumeric computations; nonnumeric irregular computations; optimized multicore parallelization; optimized sequential priority queue implementation; parallel heap system; parallel priority queue design; parallel priority queue implementation; tree-based data structure; wide heap nodes;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    High Performance Computing (HiPC), 2012 19th International Conference on
  • Conference_Location
    Pune
  • Print_ISBN
    978-1-4673-2372-7
  • Electronic_ISBN
    978-1-4673-2370-3
  • Type

    conf

  • DOI
    10.1109/HiPC.2012.6507490
  • Filename
    6507490