DocumentCode :
3599705
Title :
Using rCUDA to Reduce GPU Resource-Assignment Fragmentation Caused by Job Scheduler
Author :
Markthub, Pak ; Nomura, Akihiro ; Matsuoka, Satoshi
Author_Institution :
Tokyo Inst. of Technol., Tokyo, Japan
fYear :
2014
Firstpage :
105
Lastpage :
112
Abstract :
In heterogeneous supercomputers such as TSUBAME2.5, GPUs on some nodes in GPU batch queues are left idle even though there are jobs waiting in the queues, this is caused by GPU resource-assignment fragmentation problem. For example, in the case that each node has three GPUs like TSUBAME2.5´s, if a node has already been assigned to a job requesting two GPUs per node, that node cannot be assigned to another job requesting more than one GPU per node until the ongoing job finishes, hence, one GPU is left idle on that node. We examine this problem on TSUBAME2.5´s GPU batch-queue system and present a scheduling algorithm that assigns rCUDA (a remote CUDA execution technology) to some processes of some jobs. Because rCUDA allows jobs to utilize the idle GPUs, the proposed scheduling algorithm can alleviate the problem. Using a job pattern obtained from a scheduler log of a TSUBAME2.5´s GPU queue, our simulation shows that the proposed algorithm can decrease jobs´ lifetime (from the time when a job arrives until finishes) by about 5% on average. Moreover, it can reduce the average number of idle GPUs by about 15%. Also, even reducing the number of nodes serving jobs by around 4%, the proposed algorithm can maintain the average jobs´ lifetime around the same as the scheduling algorithm currently used in the TSUBAME2.5´s GPU queue.
Keywords :
graphics processing units; parallel architectures; parallel machines; processor scheduling; GPU resource-assignment fragmentation reduction; TSUBAME2.5 GPU batch-queue system; heterogeneous supercomputers; job pattern; job scheduler; rCUDA; remote CUDA execution technology; scheduling algorithm; Bandwidth; Data models; Data transfer; Graphics processing units; Scheduling; Scheduling algorithms; Servers; GPU execution; GPU queue; rCUDA; remote GPU execution; scheduling algorithm;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Parallel and Distributed Computing, Applications and Technologies (PDCAT), 2014 15th International Conference on
Type :
conf
DOI :
10.1109/PDCAT.2014.26
Filename :
7174773
Link To Document :
بازگشت