مرکز منطقه ای اطلاع رساني علوم و فناوري - Pipelining/Overlapping Data Transfer for Distributed Data-Intensive Job Execution

DocumentCode :

656215

Title :

Pipelining/Overlapping Data Transfer for Distributed Data-Intensive Job Execution

Author :

Eun-Sung Jung ; Maheshwari, Ketan ; Kettimuthu, Rajkumar

Author_Institution :

Math. & Comput. Sci. Div., Argonne Nat. Lab., Argonne, IL, USA

fYear :

2013

fDate :

1-4 Oct. 2013

Firstpage :

791

Lastpage :

797

Abstract :

Scientific workflows are increasingly gaining attention as both data and compute resources are getting bigger, heterogeneous, and distributed. Many scientific workflows are both compute intensive and data intensive and use distributed resources. This situation poses significant challenges in terms of real-time remote analysis and dissemination of massive datasets to scientists across the community. These challenges will be exacerbated in the exascale era. Parallel jobs in scientific workflows are common, and such parallelism can be exploited by scheduling parallel jobs among multiple execution sites for enhanced performance. Previous scheduling algorithms such as heterogeneous earliest finish time (HEFT) did not focus on scheduling thousands of jobs often seen in contemporary applications. Some techniques, such as task clustering, have been proposed to reduce the overhead of scheduling a large number of jobs. However, scheduling massively parallel jobs in distributed environments poses new challenges as data movement becomes a nontrivial factor. We propose efficient parallel execution models through pipelined execution of data transfer, incorporating network bandwidth and reserved resources at an execution site. We formally analyze those models and suggest the best model with the optimal degree of parallelism. We implement our model in the Swift parallel scripting paradigm using GridFTP. Experiments on real distributed computing resources show that our model with optimal degrees of parallelism outperform the current parallel execution model by as much as 50% reduction of total execution time.

Keywords :

electronic data interchange; natural sciences computing; parallel processing; pipeline processing; scheduling; workflow management software; HEFT; Swift parallel scripting paradigm; data movement; data transfer pipelining; distributed computing resources; distributed data-intensive job execution; exascale era; heterogeneous earliest finish time; massive dataset dissemination; massive dataset real-time remote analysis; overlapping data transfer; parallel execution models; parallel job scheduling; scientific workflows; task clustering; Computational modeling; Data transfer; Equations; Mathematical model; Pipeline processing; Silicon;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Parallel Processing (ICPP), 2013 42nd International Conference on

Conference_Location :

Lyon

ISSN :

0190-3918

Type :

conf

DOI :

10.1109/ICPP.2013.93

Filename :

6687418

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=656215