DocumentCode :
3588659
Title :
MRTune: A simulator for performance tuning of MapReduce jobs with skewed data
Author :
Xibo Zhou ; Wuman Luo ; Haoyu Tan
Author_Institution :
Guangzhou HKUST Fok Ying Tung Res. Inst., Hong Kong Univ. of Sci. & Technol., Hong Kong, China
fYear :
2014
Firstpage :
352
Lastpage :
359
Abstract :
MapReduce is a programming model designed by Google that has been widely used for both high performance computing and big data processing. Although the programming model is simple, it is very challenging to conduct performance tuning for a MapReduce job, considering the complexities of the configuration parameters and various tradeoffs between the performance gain of the optimization approaches and the extra overhead they bring about. One naive way to address this issue is to run the MapReduce jobs repeatedly using different combinations of configuration parameters and optimization methods, then select the one with the shortest running time. However, real execution is impractical because the combinations may be too many and the time of one run of each combination may be too long. Therefore, it is desirable if we can efficiently estimate the runtime of a job without real execution using only the input data and the configuration parameter settings of the cluster. In this paper, we propose a novel MapReduce simulator called MRTune for runtime estimation of MapReduce jobs. MRTune takes the key distribution of input data into consideration and can work well even when the key distribution of data is skewed. Moreover, MRTune can estimate the runtime of a job in the presence of unpredictable task failures. We evaluate MRTune implementing MapReduce jobs with Zipfian distributed input data. The result shows that MRTune can estimate the runtime of MapReduce jobs with high accuracy and efficiency while the key distribution of input data is skewed. We also conduct two case studies to analyse the impact of data skew and task failures on a MapReduce job.
Keywords :
data handling; parallel programming; software performance evaluation; Google; MRTune; MRTune implementation; MapReduce jobs; Reduce simulator; Zipfian distributed input data; configuration parameter complexities; job runtime estimation; optimization approach; overhead; performance gain; performance tuning; programming model; skewed data; unpredictable task failures; Complexity theory; Computational modeling; Data models; Tuning; MapReduce; performance tuning; runtime estimation; simulator; skew;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Parallel and Distributed Systems (ICPADS), 2014 20th IEEE International Conference on
Type :
conf
DOI :
10.1109/PADSW.2014.7097828
Filename :
7097828
Link To Document :
بازگشت