Title :
ScalScheduling: A Scalable Scheduling Architecture for MPI-based interactive analysis programs
Author :
Jiangling Yin ; Foran, Andrew ; Xuhong Zhang ; Jun Wang
Author_Institution :
EECS, Univ. of Central Florida, Orlando, FL, USA
Abstract :
In today´s large scale clusters, running tasks with high degrees of parallelism allows interactive data visualization/analysis to complete in seconds. However, conventional, centralized scheduling poses significant challenges for these interactive applications. As the amount of data to be processed grows, it becomes too heavy to move across the network. Thus, data processing tasks should be scheduled such that the amount of transferred data is minimized, i.e., realizing data locality computation. To implement this, a scheduler process should collect and analyze data distribution metadata prior to making scheduling decisions, which usually causes milliseconds or seconds of latency. Such scheduling delay is unacceptable for interactive data applications. In this paper, we present a Scalable Scheduling Architecture for conventional interactive data programs and refer to it as ScalScheduling. ScalScheduling is proposed to reduce task scheduling latency, while ensuring the worker processes achieve a high degree of data locality computation and load balance in heterogeneous environments. In our proposed architecture, each worker process uses a novel Modulo-based priority method to schedule its local tasks independently. Multiple scheduler processes are employed according to the number of worker processes to resolve the issue of concurrent requests and assign remote tasks with respect to load balance. We perform experiments using thousands of parallel processes, and the experimental results show the benefits of our proposed scheduling architecture as well as its potential for future oversize task scheduling problems on large-scale clusters.
Keywords :
message passing; parallel processing; MPI; ScalScheduling; data distribution metadata; data locality computation; interactive analysis program; interactive data visualization; large scale cluster; modulo-based priority method; parallel process; scalable scheduling architecture; scheduling delay; task scheduling latency; Computer architecture; Data processing; Distributed databases; Process control; Processor scheduling; Schedules; Scheduling;
Conference_Titel :
Computer Communication and Networks (ICCCN), 2014 23rd International Conference on
Conference_Location :
Shanghai
DOI :
10.1109/ICCCN.2014.6911753