DocumentCode :
2981703
Title :
ESAMR: An Enhanced Self-Adaptive MapReduce Scheduling Algorithm
Author :
Xiaoyu Sun ; Chen He ; Ying Lu
Author_Institution :
Dept. of Comput. Sci. & Eng., Univ. of Nebraska-Lincoln, Lincoln, NE, USA
fYear :
2012
fDate :
17-19 Dec. 2012
Firstpage :
148
Lastpage :
155
Abstract :
MapReduce is a programming model and an associated implementation for processing and generating large data sets. Hadoop is an open-source implementation of Map Reduce, enjoying wide adoption, and is used not only for batch jobs but also for short jobs where low response time is critical. However, Hadoop´s performance is currently limited by its default task scheduler, which implicitly assumes that cluster nodes are homogeneous and tasks make progress linearly, and uses these assumptions to decide when to speculatively re-execute tasks that appear to be stragglers. In practice, the homogeneity assumption does not always hold. Longest Approximate Time to End (LATE) is a Map Reduce scheduling algorithm that takes heterogeneous environments into consideration. It, however, adopts a static method to compute the progress of tasks. As a result neither Hadoop default nor LATE schedulers perform well in a heterogeneous environment. Self-adaptive Map Reduce Scheduling Algorithm (SAMR) uses historical information to adjust stage weights of map and reduce tasks when estimating task execution times. However, SAMR does not consider the fact that for different types of jobs their map and reduce stage weights may be different. Even for the same type of jobs, different datasets may lead to different weights. To this end, we propose ESAMR: an Enhanced Self-Adaptive Map Reduce scheduling algorithm to improve the speculative re-execution of slow tasks in Map Reduce. In ESAMR, in order to identify slow tasks accurately, we differentiate historical stage weights information on each node and divide them into k clusters using a k-means clustering algorithm and when executing a job´s tasks on a node, ESAMR classifies the tasks into one of the clusters and uses the cluster´s weights to estimate the execution time of the job´s tasks on the node. Experimental results show that among the aforementioned algorithms, ESAMR leads to the smallest error in task execution time estimation and iden- ifies slow tasks most accurately.
Keywords :
approximation theory; distributed processing; pattern clustering; scheduling; ESAMR; Hadoops performance; LATE; distributed computing; enhanced self adaptive MapReduce scheduling algorithm; k-means clustering algorithm; longest approximate time to end; open source implementation; programming model; static method; Algorithm design and analysis; Clustering algorithms; Computational modeling; Hardware; History; Programming; Scheduling algorithms; MapReduce; heterogeneity; speculative task re-execution;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Parallel and Distributed Systems (ICPADS), 2012 IEEE 18th International Conference on
Conference_Location :
Singapore
ISSN :
1521-9097
Print_ISBN :
978-1-4673-4565-1
Electronic_ISBN :
1521-9097
Type :
conf
DOI :
10.1109/ICPADS.2012.30
Filename :
6413702
Link To Document :
بازگشت