مرکز منطقه ای اطلاع رساني علوم و فناوري - Guided Region-Based GPU Scheduling: Utilizing Multi-thread Parallelism to Hide Memory Latency

DocumentCode :

625608

Title :

Guided Region-Based GPU Scheduling: Utilizing Multi-thread Parallelism to Hide Memory Latency

Author :

Jianmin Chen ; Xi Tao ; Zhen Yang ; Jih-Kwon Peir ; Xiaoyuan Li ; Shih-Lien Lu

Author_Institution :

Dept. of CISE, Univ. of Florida, Gainesville, FL, USA

fYear :

2013

fDate :

20-24 May 2013

Firstpage :

441

Lastpage :

451

Abstract :

Modern General-Purpose computation on Graphics Processing Units (GPGPUs) explore parallelism in applications by building massively parallel architecture and apply multithreading technology to hide the instruction and memory latencies. Such architectures become increasingly popular for parallel applications using CUDA/OpenCL programming languages. In this paper, we investigate thread scheduling algorithms on such highly-threaded GPGPUs. The traditional round-robin scheduling schemes are inefficient in handling instruction execution and memory accesses with disparate latencies. We introduce a new GPGPU thread (warp) scheduling algorithm which enables flexible roundrobin distance for efficiently utilizing multithread parallelism and use program-guided priority shift among concurrent threads (warps) to allow more overlaps between short-latency compute instructions and long-latency memory accesses. Performance evaluations demonstrate that the new scheduling algorithm improves a set of kernel execution times by an average of 12% with 52% reduction on scheduler stall cycles over the fine-granularity round-robin scheme. In this paper, we also accomplish a thorough evaluation of various thread scheduling algorithms based on the amount of hardware threads, the scheduling overhead, and the global memory latency.

Keywords :

concurrency control; graphics processing units; multi-threading; parallel architectures; performance evaluation; processor scheduling; CUDA programming language; GPGPU thread scheduling algorithm; GPGPU warp scheduling algorithm; OpenCL programming language; concurrent thread; concurrent warp; fine-granularity round-robin scheme; general-purpose computation-on-graphics processing units; global memory latency; guided region-based GPU scheduling; hardware threads; instruction execution; instruction hiding; kernel execution time improvement; long-latency memory access; massively-parallel architecture; memory latency hiding; multithread parallelism; performance evaluation; program-guided priority shift; round-robin distance; scheduler stall cycle reduction; scheduling overhead; short-latency compute instructions; thread scheduling algorithms; Graphics processing units; Instruction sets; Kernel; Scheduling; Scheduling algorithms; CUDA; GPGPU; multi-thread; thread (warp) scheduling;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Parallel & Distributed Processing (IPDPS), 2013 IEEE 27th International Symposium on

Conference_Location :

Boston, MA

ISSN :

1530-2075

Print_ISBN :

978-1-4673-6066-1

Type :

conf

DOI :

10.1109/IPDPS.2013.95

Filename :

6569832

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=625608