DocumentCode :
3497150
Title :
Breaking the boundary for whole-system performance optimization of big data
Author :
Yan Li ; Kun Wang ; Qi Guo ; Xin Li ; Xiaochen Zhang ; Guancheng Chen ; Tao Liu ; Jian Li
Author_Institution :
IBM Res. - China, China
fYear :
2013
fDate :
4-6 Sept. 2013
Firstpage :
126
Lastpage :
131
Abstract :
MapReduce plays an critical role in finding insights in Big Data. The performance optimization of MapReduce programs is challenging because it requires a comprehensive understanding of the whole system including both hardware layers (processors, storages, networks and etc), and software stacks (operating systems, JVM, runtime, applications and etc). However, most of the existing performance tuning and optimization are based on empirical and heuristic attempts. It remains a blank on how to build a systematical framework which breaks the boundary of multiple layers for performance optimization. In this paper, we propose a performance evaluation framework by correlating performance metrics from different layers, which provides insights to efficiently pinpoint the performance issue. This framework is composed of a series of predefined patterns. Each pattern indicates one or more potential issues. The behavior of a MapReduce program is mapped to the corresponding resource utilization. The framework provides a holistic approach which allows users at different levels of experience to conduct MapReduce program performance optimization. We use Terasort benchmark running on a 10-node Power7R2 cluster as a real case to show how this framework improves the performance. By this framework, we finally get the Terasort result improved from 47 mins to less than 8 mins. In addition to the best practice on performance tuning, several key findings are summarized as valuable workload analysis for JVM, MapReduce runtime and application design.
Keywords :
data handling; optimisation; parallel programming; performance evaluation; resource allocation; 10-node Power7R2 cluster; JVM; MapReduce program performance optimization; Terasort benchmark; big data; hardware layers; performance evaluation framework; performance tuning; resource utilization; software stacks; whole-system performance optimization; Hardware; Indexes; Java; Optimization; Runtime; Software; Tuning;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Low Power Electronics and Design (ISLPED), 2013 IEEE International Symposium on
Conference_Location :
Beijing
Print_ISBN :
978-1-4799-1234-6
Type :
conf
DOI :
10.1109/ISLPED.2013.6629278
Filename :
6629278
Link To Document :
بازگشت