• DocumentCode
    3078874
  • Title

    A Data Placement Strategy for Data-Intensive Scientific Workflows in Cloud

  • Author

    Qing Zhao ; Congcong Xiong ; Xi Zhao ; Ce Yu ; Jian Xiao

  • Author_Institution
    Tianjin Univ. of Sci. & Technol., Tianjin, China
  • fYear
    2015
  • fDate
    4-7 May 2015
  • Firstpage
    928
  • Lastpage
    934
  • Abstract
    With the arrival of cloud computing and Big Data, many scientific applications with large amount of data can be abstracted as scientific workflows and running on a cloud environment. Distributing these datasets intelligently can decrease data transfers efficiently during the workflow\´s execution. In this paper, we proposed a 2- stage data placement strategy. In the initial stage, we cluster the datasets based on their correlation, and allocate these clusters onto data centers. Compared with existing works, we have incorporated the data size into correlation calculation, and have proposed a new type of data correlation for the intermediate data named "the first order conduction correlation". Hence the data transmission cost can be measured more reasonable. In the runtime stage, the re-distribution algorithm can adjust data layout according to the changed factors, and the overhead of re-layout itself has also been measured. Compared with previous work, simulation results show that our proposed strategy can effectively reduce the time consumption of data movements during the workflow execution.
  • Keywords
    Big Data; cloud computing; computer centres; pattern clustering; scientific information systems; workflow management software; 2-stage data placement strategy; Big Data; cloud computing; data centers; data correlation; data redistribution algorithm; data transmission cost; data-intensive scientific workflows; dataset clustering; first order conduction correlation; Cloud computing; Correlation; Data transfer; Distributed databases; Layout; Partitioning algorithms; Runtime; data placement; data-intensive application; scientific work flow;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Cluster, Cloud and Grid Computing (CCGrid), 2015 15th IEEE/ACM International Symposium on
  • Conference_Location
    Shenzhen
  • Type

    conf

  • DOI
    10.1109/CCGrid.2015.72
  • Filename
    7152578