Title :
Diagnosing Virtualized Hadoop Performance from Benchmark Results: An Exploratory Study
Author :
Jun Fan ; Xinhui Li ; Liu, Chi Harold ; Buell, Jeffrey ; Lu, Guo-Quan ; Lu, Li
Author_Institution :
VMware R&D, Beijing Inst. of Technol., Beijing, China
fDate :
June 27 2014-July 2 2014
Abstract :
Hadoop is emerging as one of the leading frameworks used by enterprises to help make better business decisions on large data sets. Virtualization technology brings plenty of benefits to Hadoop, including higher resource utilization and cluster reliability. However, these benefits mean nothing to users if unacceptable performance degradation happens from physical to virtual platform. Existing efforts on virtualized Hadoop performance find that improper configurations of network and storage with open sourced virtual deployment cause huge overhead on system performance. However, complexity of hardware and software including virtualization configurations and various scale of deployment also makes performance tuning still too hard a practice to execute. To span that gap of virtualized Hadoop adoption, in this paper, we propose a performance diagnostic methodology that integrates statistical analysis from different layers, and design a heuristic performance diagnostic tool which evaluates the validity and correctness of virtualized Hadoop by analyzing the job traces of popular big data benchmarks. By using this tool, users could quickly identify the bottleneck according to hints provided by this tool, further confirm the diagnosis by referring to performance utilities provided by guest OS and hypervisor, and continue tuning performance for virtualized Hadoop by multiple runs of this tool.
Keywords :
Big Data; operating systems (computers); public domain software; statistical analysis; virtualisation; Big data benchmarks; cluster reliability; guest OS; hardware complexity; heuristic performance diagnostic tool; hypervisor; open sourced virtual deployment; performance diagnostic methodology; resource utilization; software complexity; statistical analysis; virtualization configurations; virtualization technology; virtualized Hadoop performance diagnosis; Benchmark testing; Big data; Degradation; History; Measurement; Tuning; Virtualization;
Conference_Titel :
Big Data (BigData Congress), 2014 IEEE International Congress on
Conference_Location :
Anchorage, AK
Print_ISBN :
978-1-4799-5056-0
DOI :
10.1109/BigData.Congress.2014.89