DocumentCode :
2766705
Title :
SAGA BigJob: An Extensible and Interoperable Pilot-Job Abstraction for Distributed Applications and Systems
Author :
Luckow, Andre ; Lacinski, Lukasz ; Jha, Shantenu
Author_Institution :
Center for Comput. & Technol., Louisiana State Univ., Baton Rouge, LA, USA
fYear :
2010
fDate :
17-20 May 2010
Firstpage :
135
Lastpage :
144
Abstract :
The uptake of distributed infrastructures by scientific applications has been limited by the availability of extensible, pervasive and simple-to-use abstractions which are required at multiple levels -- development, deployment and execution stages of scientific applications. The Pilot-Job abstraction has been shown to be an effective abstraction to address many requirements of scientific applications. Specifically, Pilot-Jobs support the decoupling of workload submission from resource assignment, this results in a flexible execution model, which in turn enables the distributed scale-out of applications on multiple and possibly heterogeneous resources. Most Pilot-Job implementations however, are tied to a specific infrastructure. In this paper, we describe the design and implementation of a SAGA-based Pilot-Job, which supports a wide range of application types, and is usable over a broad range of infrastructures, i.e., it is general-purpose and extensible, and as we will argue is also interoperable with Clouds. We discuss how the SAGA-based Pilot-Job is used for different application types and supports the concurrent usage across multiple heterogeneous distributed infrastructure, including concurrent usage across Clouds and traditional Grids/Clusters. Further, we show how Pilot-Jobs can help to support dynamic execution models and thus, introduce new opportunities for distributed applications. We also demonstrate for the first time that we are aware of, the use of multiple Pilot-Job implementations to solve the same problem, specifically, we use the SAGA-based Pilot-Job on high-end resources such as the TeraGrid and the native Condor Pilot-Job (Glide-in) on Condor resources. Importantly both are invoked via the same interface without changes at the development or deployment level, but only an execution (run-time) decision.
Keywords :
Application software; Cloud computing; Computer science; Distributed computing; Grid computing; Production systems; Robust control; Runtime; Scalability; Virtual machining; Cloud; Distributed Computing; Grid; Pilot-Job; SAGA;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Cluster, Cloud and Grid Computing (CCGrid), 2010 10th IEEE/ACM International Conference on
Conference_Location :
Melbourne, Australia
Print_ISBN :
978-1-4244-6987-1
Type :
conf
DOI :
10.1109/CCGRID.2010.91
Filename :
5493486
Link To Document :
بازگشت