Title :
Improving Data Locality of MapReduce by Scheduling in Homogeneous Computing Environments
Author :
Zhang, Xiaohong ; Zhong, Zhiyong ; Feng, Shengzhong ; Tu, Bibo ; Fan, Jianping
Author_Institution :
Inst. of Comput. Technol., Chinese Acad. of Sci., Beijing, China
Abstract :
Data Locality is one of the critical factors to affect performance. This paper proposes a next-k-node scheduling (NKS) method to improve the data locality of map tasks. The method first calculates the probabilities of each map task, and then preferentially schedules the one with the highest probability. It generates low probabilities for the tasks which satisfy node locality with the nodes to issue requests, so it can reserve these tasks to these nodes. We have implemented the NKS method in hadoop-0.20.2. The experiment results have shown that the NKS method reduced 78% of the map tasks processed without node locality, reduced 77%of the network load caused by the tasks, and improved the performance of Hadoop MapReduce when comparing with the default task scheduling method in Hadoop. Obviously, the NKS method is very suitable for the homogeneous environment with network overload.
Keywords :
data analysis; parallel processing; probability; scheduling; task analysis; Hadoop MapReduce; NKS method; data locality; hadoop-0.20.2; homogeneous computing; map tasks; next-k-node scheduling; probability; Data models; Distributed databases; Probability; Radio access networks; Schedules; Scheduling; Topology; MapReduce; cloud computing; data locality; distributed computing; network load; task scheduling;
Conference_Titel :
Parallel and Distributed Processing with Applications (ISPA), 2011 IEEE 9th International Symposium on
Conference_Location :
Busan
Print_ISBN :
978-1-4577-0391-1
Electronic_ISBN :
978-0-7695-4428-1
DOI :
10.1109/ISPA.2011.14