• DocumentCode
    246986
  • Title

    MOMC: Multi-objective and Multi-constrained Scheduling Algorithm of Many Tasks in Hadoop

  • Author

    Voicu, Cristiana ; Pop, Florin ; Dobre, C. ; Xhafa, Fatos

  • Author_Institution
    Univ. Politeh. of Bucharest, Bucharest, Romania
  • fYear
    2014
  • fDate
    8-10 Nov. 2014
  • Firstpage
    89
  • Lastpage
    96
  • Abstract
    Even though scheduling in a distributed system was debated for many years, the platforms and the job types are changing everyday. This is why we need special algorithms based on new applications requirements, especially when a application is deployed in a Cloud environment. One of the most important framework used for large-scale data processing in Clouds is Hadoop and its extensions. Hadoop framework comes with default algorithms like FIFO, Fair Scheduler or Capacity Scheduler, and Hadoop on Demand. These scheduling algorithms are focused on a different and single constraint. It is hard to satisfy multiple constraints and to have a lot of objectives in the same time. After summarizing the most common schedulers, showing the need of each one in the moment it appeared on the market, this paper presents MOMC, a multi-objective and multi-constrained scheduling algorithm of many tasks in Hadoop. MOMC implementation focuses on two objectives: avoiding resource contention and having an optimal workload of the cluster, and two constraints: deadline and budget. To compare the algorithms based on different metrics, we use Scheduling Load Simulator, which is integrated in Hadoop framework and helps the developers to spend less time on testing. As killer application that generate many tasks we have chosen processing task for the Million Song Dataset, which is a set of data contains metadata for one million commercially-available songs.
  • Keywords
    data handling; distributed processing; scheduling; FIFO algorithm; Hadoop framework; Hadoop-on-demand algorithm; MOMC scheduling algorithm; Million Song Dataset; budget constraint; capacity scheduler algorithm; cloud environment; deadline constraint; distributed system; fair scheduler algorithm; large-scale data processing; multiobjective multiconstrained scheduling algorithm; scheduling load simulator; task scheduling; Clustering algorithms; Containers; History; Measurement; Scheduling; Scheduling algorithms; Big Data; Cloud Computing; Hadoop; Map Reduce; Task Scheduling;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    P2P, Parallel, Grid, Cloud and Internet Computing (3PGCIC), 2014 Ninth International Conference on
  • Conference_Location
    Guangdong
  • Type

    conf

  • DOI
    10.1109/3PGCIC.2014.40
  • Filename
    7024563