DocumentCode :
3599717
Title :
Running Data-Intensive Scientific Workflows in the Cloud
Author :
Sato, Chiaki ; Leslie, Luke M. ; Young Choon Lee ; Zomaya, Albert Y. ; Ranjan, Rajiv
Author_Institution :
Univ. of Sydney, Sydney, NSW, Australia
fYear :
2014
Firstpage :
180
Lastpage :
185
Abstract :
The scale of scientific applications becomes increasingly large not only in computation, but also in data. Many of these applications also concern inter-related tasks with data dependencies, hence, they are scientific workflows. The efficient coordination of executing/running scientific workflows is of great practical importance. The core of such coordination is scheduling and resource allocation. In this paper, we present three scheduling heuristics for running large-scale, data-intensive scientific workflows in clouds. In particular, the three heuristic algorithms are designed to leverage slot queue threshold, data locality and data prefetching, respectively. We also demonstrate how these heuristics can be collectively used to tackle different issues in running "data-intensive" workflows in clouds although each of these heuristics can be used independently. The practicality of our algorithms has been realized by actually implementing and incorporating them into our workflow execution system (DEWE). Using Montage, an astronomical image mosaic engine, as an example workflow, and Amazon EC2 as the cloud environment, we evaluate the performance of our heuristics in terms primarily of completion time (make span). We also scrutinize workflow execution showing different execution phases to identify their impact on performance. Our algorithms scale well and reduce make span by up to 27%.
Keywords :
cloud computing; resource allocation; scheduling; storage management; Amazon EC2; DEWE; Montage; astronomical image mosaic engine; cloud environment; completion time; data dependencies; data locality; data prefetching; data-intensive scientific workflows; inter related tasks; make span reduction; resource allocation; scheduling heuristics; scientific applications; scientific workflow running; slot queue threshold; workflow execution system; Algorithm design and analysis; Australia; Electronic mail; Prefetching; Scheduling; Scheduling algorithms; Cloud Computing; Data-Intensive; Scheduling; Scientific Workflows;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Parallel and Distributed Computing, Applications and Technologies (PDCAT), 2014 15th International Conference on
Type :
conf
DOI :
10.1109/PDCAT.2014.30
Filename :
7174784
Link To Document :
بازگشت