Title :
Efficient Geo-distributed Data Processing with Rout
Author :
Jayalath, Chamikara ; Eugster, Patrick
Author_Institution :
Purdue Univ., West Lafayette, IN, USA
Abstract :
Big data processing undoubtedly represents a major challenge of this era. While several programming models and supporting systems have been proposed to deal with such data in so-called “cloud” infrastructures, they all exhibit the same limitation: all data is assumed to be located in one datacenter. This limitation results from cloud vendors promoting the abstraction of omnipresent computing and storage resources. When dealing with data distributed across datacenters, programmers currently have two options: (1) copying all data to a single datacenter easily becomes tedious if done manually as the original dataset is updated, leads to repetitive copying if performed as part of a program, and is sometimes impossible; (2) writing multiple variants of the same program, with consolidation occurring at different points varying by characteristics of the task (e.g., input sub-dataset sizes) is laborious and does not help determining the most appropriate one for a given run. This paper introduces geo-distributed data structures and operations for expressing data processing tasks taking place across datacenters. We describe the design and implementation of such data structures and operations for the PigLatin language. We illustrate the performance benefits of our geodistributed data structures and operations through several benchmarks, showing up to 2× faster response times.
Keywords :
Big Data; cloud computing; computer centres; data structures; geophysics computing; PigLatin language; Rout; big data processing; cloud infrastructures; cloud vendors; data processing tasks; datacenter; efficient geo-distributed data processing; geo-distributed data structures; omnipresent computing abstraction; programming models; storage resources; supporting systems; Data processing; Data structures; Distributed databases; Java; Programming; Schedules; Syntactics;
Conference_Titel :
Distributed Computing Systems (ICDCS), 2013 IEEE 33rd International Conference on
Conference_Location :
Philadelphia, PA
DOI :
10.1109/ICDCS.2013.23