مرکز منطقه ای اطلاع رساني علوم و فناوري - PigOut: Making multiple Hadoop clusters work together

DocumentCode :

1791541

Title :

PigOut: Making multiple Hadoop clusters work together

Author :

Kyungho Jeon ; Chandrashekhara, Sharath ; Feng Shen ; Mehra, Sanyam ; Kennedy, Oliver ; Ko, Steven Y.

Author_Institution :

SUNY - Univ. at Buffalo, Buffalo, NY, USA

fYear :

2014

fDate :

27-30 Oct. 2014

Firstpage :

100

Lastpage :

109

Abstract :

This paper presents PigOut, a system that enables federated data processing over multiple Hadoop clusters. Using PigOut, a user (such as a data analyst) can write a single script in a high-level language to efficiently use multiple Hadoop clusters. There is no need to manually write multiple scripts and coordinate the execution for different clusters. PigOut accomplishes this by automatically partitioning a single, user-supplied script into multiple scripts that run on different clusters. Additionally, PigOut generates workflow descriptions to coordinate execution across clusters. In doing so, PigOut leverages existing tools built around Hadoop, avoiding extra effort required from users or administrators. For example, PigOut uses Pig Latin, a popular query language for Hadoop MapReduce, in a (virtually) unmodified form. Through our evaluation with PigMix, the standard benchmark for Pig, we demonstrate that PigOut´s automatically-generated scripts and workflow definitions have comparable performance to manual, hand-tuned ones. We also report our experience with manually writing multiple scripts for a set of federated clusters, and compare the process with PigOut´s automated approach.

Keywords :

data handling; high level languages; parallel processing; pattern clustering; query languages; Hadoop MapReduce; Hadoop clusters; Pig Latin; PigMix; PigOut automatically-generated scripts; federated data processing; high-level language; query language; user-supplied script; workflow descriptions; Asia; Data processing; Europe; Manuals; Optimization; Programming; Writing;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Big Data (Big Data), 2014 IEEE International Conference on

Conference_Location :

Washington, DC

Type :

conf

DOI :

10.1109/BigData.2014.7004218

Filename :

7004218

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=1791541