• DocumentCode
    3360589
  • Title

    A scheduling mechanism for multiple MapReduce jobs in a workflow application (position paper)

  • Author

    Yoo, Dongjin ; Sim, Kwang Mong

  • Author_Institution
    Multi-Agent & Cloud Comput. Syst. Lab., Gwangju Inst. of Sci. & Technol. (GIST), Gwangju, South Korea
  • fYear
    2012
  • fDate
    11-13 Jan. 2012
  • Firstpage
    405
  • Lastpage
    410
  • Abstract
    MapReduce is currently an attractive model for data intensive application due to easy interface of programming, high scalability and fault tolerance capability. It is well suited for applications requiring processing large data with distributed processing resources such as web data analysis, bio informatics, and high performance computing area. There are many studies of job scheduling mechanism in shared cluster for MapReduce. However there is a need for scheduling workflow service composed of multiple MapReduce tasks with precedence dependency in multiple processing nodes. The contribution of this paper is proposing a scheduling mechanism for a workflow service containing multiple MapReduce jobs. The workflow application has precedence dependency constraints among multiple tasks, represented as directed acyclic graph (DAG). Also, for less data transfer cost in limited bisection bandwidth, data dependency criterion should be considered for scheduling multiple map-reduce jobs in a workflow. The proposed scheduling mechanism provides 1) scheduling MapReduce tasks regarding precedence constraints and 2) pre-data placement method considering data dependency constraints for saving data transfer cost over network.
  • Keywords
    directed graphs; distributed processing; scheduling; software fault tolerance; Web data analysis; bio informatics; data intensive application; data transfer cost; directed acyclic graph; fault tolerance capability; high performance computing; high scalability; multiple MapReduce jobs; scheduling mechanism; workflow application; workflow service scheduling; Cloud computing; Computational modeling; Distributed databases; Fault tolerance; Fault tolerant systems; Processor scheduling; Synchronization; Cloud Computing; Data Intensive Computing; MapReduce; Scheduling; Workflow Application;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Computing, Communications and Applications Conference (ComComAp), 2012
  • Conference_Location
    Hong Kong
  • Print_ISBN
    978-1-4577-1717-8
  • Type

    conf

  • DOI
    10.1109/ComComAp.2012.6154882
  • Filename
    6154882