DocumentCode :
43519
Title :
Kernelet: High-Throughput GPU Kernel Executions with Dynamic Slicing and Scheduling
Author :
Jianlong Zhong ; Bingsheng He
Author_Institution :
Sch. of Comput. Eng., Nanyang Technol. Univ., Singapore, Singapore
Volume :
25
Issue :
6
fYear :
2014
fDate :
Jun-14
Firstpage :
1522
Lastpage :
1532
Abstract :
Graphics processors, or GPUs, have recently been widely used as accelerators in shared environments such as clusters and clouds. In such shared environments, many kernels are submitted to GPUs from different users, and throughput is an important metric for performance and total ownership cost. Despite recently improved runtime support for concurrent GPU kernel executions, the GPU can be severely underutilized, resulting in suboptimal throughput. In this paper, we propose Kernelet, a runtime system to improve the throughput of concurrent kernel executions on the GPU. Kernelet embraces transparent memory management and PCI-e data transfer techniques, and dynamic slicing and scheduling techniques for kernel executions. With slicing, Kernelet divides a GPU kernel into multiple sub-kernels (namely slices ). Each slice has tunable occupancy to allow co-scheduling with other slices for high GPU utilization. We develop a novel Markov chain-based performance model to guide the scheduling decision. Our experimental results demonstrate up to 31 percent and 23 percent performance improvement on NVIDIA Tesla C2050 and GTX680 GPUs, respectively.
Keywords :
Markov processes; concurrency control; graphics processing units; operating system kernels; performance evaluation; processor scheduling; program slicing; storage management; GTX680 GPUs; Kernelet; Markov chain-based performance model; NVIDIA Tesla C2050; PCI-e data transfer techniques; concurrent GPU kernel executions; dynamic scheduling techniques; dynamic slicing techniques; graphics processors; high-throughput GPU kernel executions; runtime support; runtime system; shared environments; suboptimal throughput; total ownership cost; transparent memory management; Graphics processing units; Instruction sets; Kernel; Memory management; Optimal scheduling; Runtime; Throughput; GPGPU; Kernel slicing; Markov chain; performance modeling; task scheduling;
fLanguage :
English
Journal_Title :
Parallel and Distributed Systems, IEEE Transactions on
Publisher :
ieee
ISSN :
1045-9219
Type :
jour
DOI :
10.1109/TPDS.2013.257
Filename :
6624111
Link To Document :
بازگشت