• DocumentCode
    249448
  • Title

    Diagnosing Virtualized Hadoop Performance from Benchmark Results: An Exploratory Study

  • Author

    Jun Fan ; Xinhui Li ; Liu, Chi Harold ; Buell, Jeffrey ; Lu, Guo-Quan ; Lu, Li

  • Author_Institution
    VMware R&D, Beijing Inst. of Technol., Beijing, China
  • fYear
    2014
  • fDate
    June 27 2014-July 2 2014
  • Firstpage
    578
  • Lastpage
    585
  • Abstract
    Hadoop is emerging as one of the leading frameworks used by enterprises to help make better business decisions on large data sets. Virtualization technology brings plenty of benefits to Hadoop, including higher resource utilization and cluster reliability. However, these benefits mean nothing to users if unacceptable performance degradation happens from physical to virtual platform. Existing efforts on virtualized Hadoop performance find that improper configurations of network and storage with open sourced virtual deployment cause huge overhead on system performance. However, complexity of hardware and software including virtualization configurations and various scale of deployment also makes performance tuning still too hard a practice to execute. To span that gap of virtualized Hadoop adoption, in this paper, we propose a performance diagnostic methodology that integrates statistical analysis from different layers, and design a heuristic performance diagnostic tool which evaluates the validity and correctness of virtualized Hadoop by analyzing the job traces of popular big data benchmarks. By using this tool, users could quickly identify the bottleneck according to hints provided by this tool, further confirm the diagnosis by referring to performance utilities provided by guest OS and hypervisor, and continue tuning performance for virtualized Hadoop by multiple runs of this tool.
  • Keywords
    Big Data; operating systems (computers); public domain software; statistical analysis; virtualisation; Big data benchmarks; cluster reliability; guest OS; hardware complexity; heuristic performance diagnostic tool; hypervisor; open sourced virtual deployment; performance diagnostic methodology; resource utilization; software complexity; statistical analysis; virtualization configurations; virtualization technology; virtualized Hadoop performance diagnosis; Benchmark testing; Big data; Degradation; History; Measurement; Tuning; Virtualization;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Big Data (BigData Congress), 2014 IEEE International Congress on
  • Conference_Location
    Anchorage, AK
  • Print_ISBN
    978-1-4799-5056-0
  • Type

    conf

  • DOI
    10.1109/BigData.Congress.2014.89
  • Filename
    6906831