Title :
Performance modeling and optimization of MapReduce programs
Author :
Jinsong Yin ; Yuanyuan Qiao
Author_Institution :
Beijing Key Lab. of Network Syst. Archit. & Convergence, Beijing Univ. of Posts & Telecommun., Beijing, China
Abstract :
MapReduce is a developer-friendly framework that encapsulates the underlying complexities of distributed computing. It is increasingly being used across enterprises for advanced data analytics, business intelligence, and data mining tasks. But there are two questions bothering Hadoop users: how to improve the performance of MapReduce workloads, and how to estimate the time needed to run a MapReduce job. In this paper, we provide some performance optimization techniques on the premise of workload characterization. After the cluster achieving the best performance, we further propose a modeling method to help Hadoop users estimate the execution time of MapReduce jobs. For evaluation, typical benchmarks are utilized to evaluate the accuracy of our techniques.
Keywords :
competitive intelligence; data analysis; data mining; parallel processing; Hadoop; MapReduce programs; advanced data analytics; business intelligence; data mining tasks; developer-friendly framework; distributed computing; performance modeling; performance optimization techniques; Benchmark testing; Business; Optimization; Reliability; MapReduce; Performance optimization; Time modeling; Workload characterization;
Conference_Titel :
Cloud Computing and Intelligence Systems (CCIS), 2014 IEEE 3rd International Conference on
Print_ISBN :
978-1-4799-4720-1
DOI :
10.1109/CCIS.2014.7175726