DocumentCode :
2206116
Title :
SAMR: A Self-adaptive MapReduce Scheduling Algorithm in Heterogeneous Environment
Author :
Chen, Quan ; Zhang, Daqiang ; Guo, Minyi ; Deng, Qianni ; Guo, Song
Author_Institution :
Dept. of Comput. Sci., Shanghai Jiao Tong Univ., Shanghai, China
fYear :
2010
fDate :
June 29 2010-July 1 2010
Firstpage :
2736
Lastpage :
2743
Abstract :
Hadoop is seriously limited by its MapReduce scheduler which does not scale well in heterogeneous environment. Heterogenous environment is characterized by various devices which vary greatly with respect to the capacities of computation and communication, architectures, memorizes and power. As an important extension of Hadoop, LATE MapReduce scheduling algorithm takes heterogeneous environment into consideration. However, it falls short of solving the crucial problem - poor performance due to the static manner in which it computes progress of tasks. Consequently, neither Hadoop nor LATE schedulers are desirable in heterogeneous environment. To this end, we propose SAMR: a Self-Adaptive MapReduce scheduling algorithm, which calculates progress of tasks dynamically and adapts to the continuously varying environment automatically. When a job is committed, SAMR splits the job into lots of fine-grained map and reduce tasks, then assigns them to a series of nodes. Meanwhile, it reads historical information which stored on every node and updated after every execution. Then, SAMR adjusts time weight of each stage of map and reduce tasks according to the historical information respectively. Thus, it gets the progress of each task accurately and finds which tasks need backup tasks. What´s more, it identifies slow nodes and classifies them into the sets of slow nodes dynamically. According to the information of these slow nodes, SAMR will not launch backup tasks on them, ensuring the backup tasks will not be slow tasks any more. It gets the final results of the fine-grained tasks when either slow tasks or backup tasks finish first. The proposed algorithm is evaluated by extensive experiments over various heterogeneous environment. Experimental results show that SAMR significantly decreases the time of execution up to 25% compared with Hadoop´s scheduler and up to 14% compared with LATE scheduler.
Keywords :
cartography; scheduling; Hadoop; LATE MapReduce scheduling algorithm; SAMR; heterogeneous environment; self-adaptive mapreduce scheduling algorithm; Cloud computing; Computer architecture; Data models; History; Programming; Scheduling algorithm; Tuning; Heterogeneous environment; MapReduce; Scheduling algorithm; Self-adaptive;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Computer and Information Technology (CIT), 2010 IEEE 10th International Conference on
Conference_Location :
Bradford
Print_ISBN :
978-1-4244-7547-6
Type :
conf
DOI :
10.1109/CIT.2010.458
Filename :
5578538
Link To Document :
بازگشت