Title :
A Data Placement Strategy for Data-Intensive Scientific Workflows in Cloud
Author :
Qing Zhao ; Congcong Xiong ; Xi Zhao ; Ce Yu ; Jian Xiao
Author_Institution :
Tianjin Univ. of Sci. & Technol., Tianjin, China
Abstract :
With the arrival of cloud computing and Big Data, many scientific applications with large amount of data can be abstracted as scientific workflows and running on a cloud environment. Distributing these datasets intelligently can decrease data transfers efficiently during the workflow\´s execution. In this paper, we proposed a 2- stage data placement strategy. In the initial stage, we cluster the datasets based on their correlation, and allocate these clusters onto data centers. Compared with existing works, we have incorporated the data size into correlation calculation, and have proposed a new type of data correlation for the intermediate data named "the first order conduction correlation". Hence the data transmission cost can be measured more reasonable. In the runtime stage, the re-distribution algorithm can adjust data layout according to the changed factors, and the overhead of re-layout itself has also been measured. Compared with previous work, simulation results show that our proposed strategy can effectively reduce the time consumption of data movements during the workflow execution.
Keywords :
Big Data; cloud computing; computer centres; pattern clustering; scientific information systems; workflow management software; 2-stage data placement strategy; Big Data; cloud computing; data centers; data correlation; data redistribution algorithm; data transmission cost; data-intensive scientific workflows; dataset clustering; first order conduction correlation; Cloud computing; Correlation; Data transfer; Distributed databases; Layout; Partitioning algorithms; Runtime; data placement; data-intensive application; scientific work flow;
Conference_Titel :
Cluster, Cloud and Grid Computing (CCGrid), 2015 15th IEEE/ACM International Symposium on
Conference_Location :
Shenzhen
DOI :
10.1109/CCGrid.2015.72