Title :
MOMC: Multi-objective and Multi-constrained Scheduling Algorithm of Many Tasks in Hadoop
Author :
Voicu, Cristiana ; Pop, Florin ; Dobre, C. ; Xhafa, Fatos
Author_Institution :
Univ. Politeh. of Bucharest, Bucharest, Romania
Abstract :
Even though scheduling in a distributed system was debated for many years, the platforms and the job types are changing everyday. This is why we need special algorithms based on new applications requirements, especially when a application is deployed in a Cloud environment. One of the most important framework used for large-scale data processing in Clouds is Hadoop and its extensions. Hadoop framework comes with default algorithms like FIFO, Fair Scheduler or Capacity Scheduler, and Hadoop on Demand. These scheduling algorithms are focused on a different and single constraint. It is hard to satisfy multiple constraints and to have a lot of objectives in the same time. After summarizing the most common schedulers, showing the need of each one in the moment it appeared on the market, this paper presents MOMC, a multi-objective and multi-constrained scheduling algorithm of many tasks in Hadoop. MOMC implementation focuses on two objectives: avoiding resource contention and having an optimal workload of the cluster, and two constraints: deadline and budget. To compare the algorithms based on different metrics, we use Scheduling Load Simulator, which is integrated in Hadoop framework and helps the developers to spend less time on testing. As killer application that generate many tasks we have chosen processing task for the Million Song Dataset, which is a set of data contains metadata for one million commercially-available songs.
Keywords :
data handling; distributed processing; scheduling; FIFO algorithm; Hadoop framework; Hadoop-on-demand algorithm; MOMC scheduling algorithm; Million Song Dataset; budget constraint; capacity scheduler algorithm; cloud environment; deadline constraint; distributed system; fair scheduler algorithm; large-scale data processing; multiobjective multiconstrained scheduling algorithm; scheduling load simulator; task scheduling; Clustering algorithms; Containers; History; Measurement; Scheduling; Scheduling algorithms; Big Data; Cloud Computing; Hadoop; Map Reduce; Task Scheduling;
Conference_Titel :
P2P, Parallel, Grid, Cloud and Internet Computing (3PGCIC), 2014 Ninth International Conference on
Conference_Location :
Guangdong
DOI :
10.1109/3PGCIC.2014.40