مرکز منطقه ای اطلاع رساني علوم و فناوري - ESAMR: An Enhanced Self-Adaptive MapReduce Scheduling Algorithm

DocumentCode :

2981703

Title :

ESAMR: An Enhanced Self-Adaptive MapReduce Scheduling Algorithm

Author :

Xiaoyu Sun ; Chen He ; Ying Lu

Author_Institution :

Dept. of Comput. Sci. & Eng., Univ. of Nebraska-Lincoln, Lincoln, NE, USA

fYear :

2012

fDate :

17-19 Dec. 2012

Firstpage :

148

Lastpage :

155

Abstract :

MapReduce is a programming model and an associated implementation for processing and generating large data sets. Hadoop is an open-source implementation of Map Reduce, enjoying wide adoption, and is used not only for batch jobs but also for short jobs where low response time is critical. However, Hadoop´s performance is currently limited by its default task scheduler, which implicitly assumes that cluster nodes are homogeneous and tasks make progress linearly, and uses these assumptions to decide when to speculatively re-execute tasks that appear to be stragglers. In practice, the homogeneity assumption does not always hold. Longest Approximate Time to End (LATE) is a Map Reduce scheduling algorithm that takes heterogeneous environments into consideration. It, however, adopts a static method to compute the progress of tasks. As a result neither Hadoop default nor LATE schedulers perform well in a heterogeneous environment. Self-adaptive Map Reduce Scheduling Algorithm (SAMR) uses historical information to adjust stage weights of map and reduce tasks when estimating task execution times. However, SAMR does not consider the fact that for different types of jobs their map and reduce stage weights may be different. Even for the same type of jobs, different datasets may lead to different weights. To this end, we propose ESAMR: an Enhanced Self-Adaptive Map Reduce scheduling algorithm to improve the speculative re-execution of slow tasks in Map Reduce. In ESAMR, in order to identify slow tasks accurately, we differentiate historical stage weights information on each node and divide them into k clusters using a k-means clustering algorithm and when executing a job´s tasks on a node, ESAMR classifies the tasks into one of the clusters and uses the cluster´s weights to estimate the execution time of the job´s tasks on the node. Experimental results show that among the aforementioned algorithms, ESAMR leads to the smallest error in task execution time estimation and iden- ifies slow tasks most accurately.

Keywords :

approximation theory; distributed processing; pattern clustering; scheduling; ESAMR; Hadoops performance; LATE; distributed computing; enhanced self adaptive MapReduce scheduling algorithm; k-means clustering algorithm; longest approximate time to end; open source implementation; programming model; static method; Algorithm design and analysis; Clustering algorithms; Computational modeling; Hardware; History; Programming; Scheduling algorithms; MapReduce; heterogeneity; speculative task re-execution;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Parallel and Distributed Systems (ICPADS), 2012 IEEE 18th International Conference on

Conference_Location :

Singapore

ISSN :

1521-9097

Print_ISBN :

978-1-4673-4565-1

Electronic_ISBN :

1521-9097

Type :

conf

DOI :

10.1109/ICPADS.2012.30

Filename :

6413702

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=2981703