DocumentCode :
2183626
Title :
Self-Learning MapReduce Scheduler in Multi-job Environment
Author :
Changhang Lin ; Wenzhong Guo ; Changhui Lin
Author_Institution :
Coll. of Math. & Comput. Sci., Fuzhou Univ., Fuzhou, China
fYear :
2013
fDate :
16-19 Dec. 2013
Firstpage :
610
Lastpage :
612
Abstract :
Hadoop, as the most widely adopted open-source implementation of MapReduce framework, makes MapReduce widely accessible. However, it is currently limited by its default MapReduce scheduler. To achieve better performance, the scheduler should take into consideration nodes´ computing power and system resources in heterogeneous environment. Further more, from job perspective, tasks´ non-linear progress is also an important factor. Some research work has been carried out to enhance the performance of MapReduce, but they are not satisfactory in terms of considering characteristics of both nodes and jobs. To overcome this drawback, we propose a Self-Learning MapReduce Scheduler (SLM), which outperforms the existing schedulers in multi-job environment. Since competitions on system resources may make a task´s progress unpredictable, SLM determines the progress of each job based on its own historical information. In particular, on the self-learning stage of a job, with the feedback information from the first few tasks, SLM calculates the task phase weights. With these phase weights, SLM can obtain more accurate execution time estimation, which is the most important condition to finding stragglers (slow tasks). Experimental results show that, SLM can effectively improve the accuracy of execution time estimation and straggler identification, leading to the rational utilization of resources and shortening jobs´ execution time especially in multi-job environment.
Keywords :
parallel processing; resource allocation; scheduling; Hadoop; MapReduce framework; SLM; execution time estimation; feedback information; heterogeneous environment; historical information; multijob environment; resource utilization; self-learning MapReduce scheduler; straggler identification; Accuracy; Cloud computing; Computational modeling; Dynamic scheduling; Educational institutions; Estimation; Hadoop; heterogeneous environment; multi-job; speculative execution; straggler;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Cloud Computing and Big Data (CloudCom-Asia), 2013 International Conference on
Conference_Location :
Fuzhou
Print_ISBN :
978-1-4799-2829-3
Type :
conf
DOI :
10.1109/CLOUDCOM-ASIA.2013.95
Filename :
6821057
Link To Document :
بازگشت