Title :
Center-of-Gravity Reduce Task Scheduling to Lower MapReduce Network Traffic
Author :
Hammoud, Mohammad ; Rehman, M. Suhail ; Sakr, Majd F.
Author_Institution :
Carnegie Mellon Univ. in Qatar, Doha, Qatar
Abstract :
MapReduce is by far one of the most successful realizations of large-scale data-intensive cloud computing platforms. MapReduce automatically parallelizes computation by running multiple map and/or reduce tasks over distributed data across multiple machines. Hadoop is an open source implementation of MapReduce. When Hadoop schedules reduce tasks, it neither exploits data locality nor addresses partitioning skew present in some MapReduce applications. This might lead to increased cluster network traffic. In this paper we investigate the problems of data locality and partitioning skew in Hadoop. We propose Center-of-Gravity Reduce Scheduler (CoGRS), a locality-aware skew-aware reduce task scheduler for saving MapReduce network traffic. In an attempt to exploit data locality, CoGRS schedules each reduce task at its center-of-gravity node, which is computed after considering partitioning skew as well. We implemented CoGRS in Hadoop-0.20.2 and tested it on a private cloud as well as on Amazon EC2. As compared to native Hadoop, our results show that CoGRS minimizes off-rack network traffic by averages of 9.6% and 38.6% on our private cloud and on an Amazon EC2 cluster, respectively. This reflects on job execution times and provides an improvement of up to 23.8%.
Keywords :
cloud computing; data analysis; public domain software; scheduling; software performance evaluation; ubiquitous computing; Amazon EC2 cluster; CoGRS; Hadoop-0.20.2; MapReduce network traffic; center-of-gravity reduce task scheduling; cluster network traffic; data locality; large-scale data-intensive cloud computing platforms; locality-aware skew-aware reduce task scheduler; off-rack network traffic minimization; open source implementation; partitioning skew; task reduction; ubiquitous programming model; Bandwidth; Benchmark testing; Cloud computing; Distributed databases; Network topology; Schedules; Scheduling; Hadoop; MapReduce; Reduce Task Scheduling;
Conference_Titel :
Cloud Computing (CLOUD), 2012 IEEE 5th International Conference on
Conference_Location :
Honolulu, HI
Print_ISBN :
978-1-4673-2892-0
DOI :
10.1109/CLOUD.2012.92