• DocumentCode
    154161
  • Title

    Simulating Big Data Clusters for System Planning, Evaluation, and Optimization

  • Author

    Zhaojuan Bian ; Kebing Wang ; Zhihong Wang ; Munce, Gene ; Cremer, Illia ; Wei Zhou ; Qian Chen ; Gen Xu

  • Author_Institution
    Software & Service Group, Intel Corp., Shanghai, China
  • fYear
    2014
  • fDate
    9-12 Sept. 2014
  • Firstpage
    391
  • Lastpage
    400
  • Abstract
    With the fast development of big data technologies, IT spending on computer clusters is increasing rapidly as well. In order to minimize the cost, architects must plan big data clusters with careful evaluation of various design choices. Current capacity planning methods are mostly trial-and-error or high level estimation based. These approaches, however, are far from efficient, especially with the increasing hardware diversity and software stack complexity. In this paper, we present CSMethod, a novel cluster simulation methodology, to facilitate efficient cluster capacity planning, performance evaluation and optimization, before system provisioning. With our proposed methodology, software stacks are simulated by an abstract yet high fidelity model, Hardware activities derived from software operations are dynamically mapped onto architecture models for processors, memory, storage and networking devices. This hardware/software hybrid methodology allows low overhead, fast and accurate cluster simulation that can be easily carried out on a standard client platform (desktop or laptop). Our experimental results with six popular Hadoop workloads demonstrate that CSMethod can achieve an average error rate of less than six percent, across various software parameters and cluster hardware configurations. We also illustrate the application of the proposed methodology with two real-world use cases: Video-streaming service system planning and Terasort cluster optimization. All our experiments are run on a commodity laptop with execution speeds faster than native executions on a multi-node high-end cluster.
  • Keywords
    Big Data; middleware; performance evaluation; video streaming; CSMethod; Hadoop workloads; Terasort cluster optimization; big data clusters; cluster capacity planning; cluster hardware configurations; cluster simulation methodology; computer clusters; hardware diversity; hardware-software hybrid methodology; performance evaluation; software parameters; software stack complexity; system evaluation; system optimization; system provisioning; video-streaming service system planning; Big data; Computational modeling; Computer architecture; Hardware; Program processors; Unified modeling language; big data; cluster simulation; data center capacity planning; performance modeling;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Parallel Processing (ICPP), 2014 43rd International Conference on
  • Conference_Location
    Minneapolis MN
  • ISSN
    0190-3918
  • Type

    conf

  • DOI
    10.1109/ICPP.2014.48
  • Filename
    6957248