Title :
Simulating Big Data Clusters for System Planning, Evaluation, and Optimization
Author :
Zhaojuan Bian ; Kebing Wang ; Zhihong Wang ; Munce, Gene ; Cremer, Illia ; Wei Zhou ; Qian Chen ; Gen Xu
Author_Institution :
Software & Service Group, Intel Corp., Shanghai, China
Abstract :
With the fast development of big data technologies, IT spending on computer clusters is increasing rapidly as well. In order to minimize the cost, architects must plan big data clusters with careful evaluation of various design choices. Current capacity planning methods are mostly trial-and-error or high level estimation based. These approaches, however, are far from efficient, especially with the increasing hardware diversity and software stack complexity. In this paper, we present CSMethod, a novel cluster simulation methodology, to facilitate efficient cluster capacity planning, performance evaluation and optimization, before system provisioning. With our proposed methodology, software stacks are simulated by an abstract yet high fidelity model, Hardware activities derived from software operations are dynamically mapped onto architecture models for processors, memory, storage and networking devices. This hardware/software hybrid methodology allows low overhead, fast and accurate cluster simulation that can be easily carried out on a standard client platform (desktop or laptop). Our experimental results with six popular Hadoop workloads demonstrate that CSMethod can achieve an average error rate of less than six percent, across various software parameters and cluster hardware configurations. We also illustrate the application of the proposed methodology with two real-world use cases: Video-streaming service system planning and Terasort cluster optimization. All our experiments are run on a commodity laptop with execution speeds faster than native executions on a multi-node high-end cluster.
Keywords :
Big Data; middleware; performance evaluation; video streaming; CSMethod; Hadoop workloads; Terasort cluster optimization; big data clusters; cluster capacity planning; cluster hardware configurations; cluster simulation methodology; computer clusters; hardware diversity; hardware-software hybrid methodology; performance evaluation; software parameters; software stack complexity; system evaluation; system optimization; system provisioning; video-streaming service system planning; Big data; Computational modeling; Computer architecture; Hardware; Program processors; Unified modeling language; big data; cluster simulation; data center capacity planning; performance modeling;
Conference_Titel :
Parallel Processing (ICPP), 2014 43rd International Conference on
Conference_Location :
Minneapolis MN
DOI :
10.1109/ICPP.2014.48