• DocumentCode
    3600007
  • Title

    Data Placement and Task Scheduling Optimization for Data Intensive Scientific Workflow in Multiple Data Centers Environment

  • Author

    Mingjun Wang ; Jinghui Zhang ; Fang Dong ; Junzhou Luo

  • Author_Institution
    Sch. of Comput. Sci. & Eng., Southeast Univ., Nanjing, China
  • fYear
    2014
  • Firstpage
    77
  • Lastpage
    84
  • Abstract
    Running data-intensive scientific workflow across multiple data centers faces massive data transfer problem which leads to low efficiency in actual workflow application for scientists. By considering data size and data dependency, we propose a k-means algorithm based initial data placement strategy that places the most related initial data sets into the same data center at workflow preparation stage. During the execution of scientific workflow, by analyzing interdependent relationship between data sets and tasks, we adopt multilevel task replication strategy to reduce volume of intermediate data transfer. The simulation results show that the proposed strategies can effectively reduce data transfer among data centers and improve performance of running data intensive scientific workflows.
  • Keywords
    computer centres; data handling; natural sciences computing; scheduling; data-intensive scientific workflow; initial data placement strategy; intermediate data transfer volume; k-means algorithm; multilevel task replication strategy; multiple data centers; task scheduling optimization; Algorithm design and analysis; Big data; Data models; Data transfer; Distributed databases; Processor scheduling; Scheduling; cloud computing; data center; data placement; multilevel task replication; scientific workflows;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Advanced Cloud and Big Data (CBD), 2014 Second International Conference on
  • Print_ISBN
    978-1-4799-8086-4
  • Type

    conf

  • DOI
    10.1109/CBD.2014.19
  • Filename
    7176075