Title :
Performance Optimization for Short MapReduce Job Execution in Hadoop
Author :
Jinshuang Yan ; XiaoLiang Yang ; Rong Gu ; Chunfeng Yuan ; Yihua Huang
Author_Institution :
Dept. of Comput. Sci. & Technol., Nanjing Univ., Nanjing, China
Abstract :
Hadoop MapReduce is a widely used parallel computing framework for solving data-intensive problems. To be able to process large-scale datasets, the fundamental design of the standard Hadoop places more emphasis on high-throughput of data than on job execution performance. This causes performance limitation when we use Hadoop MapReduce to execute short jobs that requires quick responses. In order to speed up the execution of short jobs, this paper proposes optimization methods to improve the execution performance of MapReduce jobs. We made three major optimizations: first, we reduce the time cost during the initialization and termination stages of a job by optimizing its setup and cleanup tasks, second, we replace the pull-model task assignment mechanism with a push-model, third, we replace the heartbeat-based communication mechanism with an instant message communication mechanism for event notifications between the Job Tracker and Task Trackers. Experimental results show that the job execution performance of our improved version of Hadoop is about 23% faster on average than the standard Hadoop for our test application.
Keywords :
parallel processing; Hadoop MapReduce job execution; data intensive problems; event notification; heartbeat based communication mechanism; job tracker; parallel computing framework; performance optimization; pull model task assignment mechanism; push model; task trackers; Algorithm design and analysis; Delay; Heart beat; Optimization methods; Parallel processing; Standards; MapReduce; job execution; parallel computing; performance optimization;
Conference_Titel :
Cloud and Green Computing (CGC), 2012 Second International Conference on
Conference_Location :
Xiangtan
Print_ISBN :
978-1-4673-3027-5
DOI :
10.1109/CGC.2012.40