DocumentCode
2052747
Title
Automatic Task Re-organization in MapReduce
Author
Guo, Zhenhua ; Pierce, Marlon ; Fox, Geoffrey ; Zhou, Mo
Author_Institution
Sch. of Inf. & Comput., Indiana Univ., Bloomington, IN, USA
fYear
2011
fDate
26-30 Sept. 2011
Firstpage
335
Lastpage
343
Abstract
MapReduce is increasingly considered as a useful parallel programming model for large-scale data processing. It exploits parallelism among execution of primitive map and reduce operations. Hadoop is an open source implementation of MapReduce that has been used in both academic research and industry production. However, its implementation strategy that one map task processes one data block limits the degree of concurrency and degrades performance because of inability to fully utilize available resources. In addition, its assumption that task execution time in each phase does not vary much does not always hold, which makes speculative execution useless. In this paper, we present mechanisms to dynamically split and consolidate tasks to cope with load balancing and break through the concurrency limit resulting from fixed task granularity. For single-job systems, two algorithms are proposed for circumstances where prior knowledge is known and unknown. For multi-job cases, we propose a modified shortest-job-first strategy, which minimizes job turnaround time theoretically when combined with task splitting. We compared the effectiveness of our approach to the default task scheduling strategy using both synthesized and trace-based workloads. Simulation results show that our approach improves performance significantly.
Keywords
large-scale systems; parallel programming; resource allocation; scheduling; Hadoop; MapReduce; automatic task reorganization; large-scale data processing; load balancing; parallel programming; shortest-job-first strategy; task granularity; task scheduling; task splitting; Clustering algorithms; Concurrent computing; Data models; Educational institutions; Load management; Scheduling; Skeleton; Bag-of-Divisible-Tasks; Load Balancing; MapReduce; Task Splitting;
fLanguage
English
Publisher
ieee
Conference_Titel
Cluster Computing (CLUSTER), 2011 IEEE International Conference on
Conference_Location
Austin, TX
Print_ISBN
978-1-4577-1355-2
Electronic_ISBN
978-0-7695-4516-5
Type
conf
DOI
10.1109/CLUSTER.2011.44
Filename
6061152
Link To Document