DocumentCode :
2017230
Title :
Programming Abstractions for Data Intensive Computing on Clouds and Grids
Author :
Miceli, Chris ; Miceli, Michael ; Jha, Shantenu ; Kaiser, Hartmut ; Merzky, Andre
Author_Institution :
Center for Comput. & Technol., Louisiana State Univ., Baton Rouge, LA
fYear :
2009
fDate :
18-21 May 2009
Firstpage :
478
Lastpage :
483
Abstract :
MapReduce has emerged as an important data-parallel programming model for data-intensive computing - for Clouds and Grids. However most if not all implementations of MapReduce are coupled to a specific infrastructure. SAGA is a high-level programming interface which provides the ability to create distributed applications in an infrastructure independent way. In this paper, we show how MapReduce has been implemented using SAGA and demonstrate its interoperability across different distributed platforms - Grids, Cloud-like infrastructure and Clouds. We discuss the advantages of programmatically developing MapReduce using SAGA, by demonstrating that the SAGA-based implementation is infrastructure independent whilst still providing control over the deployment, distribution and runtime decomposition. The ability to control the distribution and placement of the computation units (workers) is critical in order to implement the ability to move computational work to the data. This is required to keep data network transfer low and in the case of commercial Clouds the monetary cost of computing the solution low. Using data-sets of size up to 10GB, and upto 10 workers, we provide detailed performance analysis of the SAGA-MapReduce implementation, and show how controllingthe distribution of computation and the payload per worker helps enhance performance.
Keywords :
application program interfaces; grid computing; parallel programming; MapReduce data-parallel programming model; SAGA programming interface; cloud computing; data intensive computing; data network transfer; distributed application; grid computing; programming abstraction; Cloud computing; Computer networks; Computer science; Costs; Distributed computing; Grid computing; Performance analysis; Runtime; Size control; USA Councils; SAGA; clouds grids; data intensive; mapreduce;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Cluster Computing and the Grid, 2009. CCGRID '09. 9th IEEE/ACM International Symposium on
Conference_Location :
Shanghai
Print_ISBN :
978-1-4244-3935-5
Electronic_ISBN :
978-0-7695-3622-4
Type :
conf
DOI :
10.1109/CCGRID.2009.87
Filename :
5071908
Link To Document :
بازگشت