Title :
A Lightweight Continuous Jobs Mechanism for MapReduce Frameworks
Author :
Trong-Tuan Vu ; Huet, Fabrice
Author_Institution :
INRIA Lille Nord Eur., Lille, France
Abstract :
MapReduce is a programming model which allows the processing of vast amounts of data in parallel, on a large number of machines. It is particularly well suited to static or slow changing set of data since the execution time of a job is usually high. However, in practice data-centers collect data at fast rates which makes it very difficult to maintain up-to-date results. To address this challenge, we propose in this paper a generic mechanism for dealing with dynamic data in MapReduce frameworks. Long-standing MapReduce jobs, called continuous Jobs, are automatically re-executed to process new incoming data at a minimum cost. We present a simple and clean API which integrates nicely with the standard MapReduce model. Furthermore, we describe cHadoop, an implementation of our approach based on Hadoop which does not require modifications to the source code of the original framework. Thus, cHadoop can quickly be ported to any new version of Hadoop. We evaluate our proposal with two standard MapReduce applications (Word Count and Word Count-N-Count), and one real world application (RDF Query) on real datasets. Our evaluations on clusters ranging from 5 to 40 nodes demonstrate the benefit of our approach in terms of execution time and ease of use.
Keywords :
application program interfaces; data handling; message passing; parallel processing; API; MapReduce application; MapReduce framework; MapReduce programming model; RDF Query; Word Count-N-Count; cHadoop; data center; data collection; dynamic data; job execution time; lightweight continuous jobs mechanism; new incoming data processing; parallel data processing; publish-subscribe system; Computational modeling; Data models; Distributed databases; Programming; Proposals; Resource description framework; Standards; continuous MapReduce; publish/subscribe;
Conference_Titel :
Cluster, Cloud and Grid Computing (CCGrid), 2013 13th IEEE/ACM International Symposium on
Conference_Location :
Delft
Print_ISBN :
978-1-4673-6465-2
DOI :
10.1109/CCGrid.2013.36