DocumentCode :
3144196
Title :
An effective data locality aware task scheduling method for MapReduce framework in heterogeneous environments
Author :
Zhang, Xiaohong ; Feng, Yuhong ; Feng, Shengzhong ; Fan, Jianping ; Ming, Zhong
Author_Institution :
Sch. of Comput. Sci. & Technol., Henan Polytech. Univ., Jiaozuo, China
fYear :
2011
fDate :
12-14 Dec. 2011
Firstpage :
235
Lastpage :
242
Abstract :
Data locality has recently been extensively exploited in Cloud computing to improve system performance. However, when schedule Map tasks in Hadoop MapReduce framework working in a heterogeneous environment, existing methods either cannot reduce the occurrence of these Map tasks or injure fairness, thus degrading the system performance. In order to address this problem, this paper proposes a data locality aware scheduling method to improve the Hadoop MapReduce system performance in heterogeneous computing environments. After receiving a request from a requesting node, our method preferentially schedules the task whose input data is stored on the requesting node. If no such tasks exist, our method will select the task whose input data is nearest to the requesting node, and then make a decision on whether to reserve the task for the node storing the input data or schedule the task to the requesting node by transferring the input data to the requesting node on the fly. As a proof of concept, we implement the method in Hadoop-0.20.2. In order to evaluate the performance, we carry out an experimental comparison study on our proposed method against the default scheduling method used in Hadoop-0.20.2. The experiment results show that our proposed method improves the data locality and reduces the normalized execution time as well as the response time of jobs.
Keywords :
cloud computing; distributed processing; scheduling; software performance evaluation; Hadoop MapReduce framework; Hadoop-0.20.2; cloud computing; data locality aware task scheduling method; default scheduling method; heterogeneous environments; performance evaluation; requesting node; system performance improvement; Cloud computing; Data communication; Distributed databases; Processor scheduling; Schedules; System performance; Time factors; Cloud Computing; Data Locality; Distributed Computing; MapReduce; Task Scheduling;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Cloud and Service Computing (CSC), 2011 International Conference on
Conference_Location :
Hong Kong
Print_ISBN :
978-1-4577-1635-5
Electronic_ISBN :
978-1-4577-1636-2
Type :
conf
DOI :
10.1109/CSC.2011.6138527
Filename :
6138527
Link To Document :
بازگشت