DocumentCode
154161
Title
Simulating Big Data Clusters for System Planning, Evaluation, and Optimization
Author
Zhaojuan Bian ; Kebing Wang ; Zhihong Wang ; Munce, Gene ; Cremer, Illia ; Wei Zhou ; Qian Chen ; Gen Xu
Author_Institution
Software & Service Group, Intel Corp., Shanghai, China
fYear
2014
fDate
9-12 Sept. 2014
Firstpage
391
Lastpage
400
Abstract
With the fast development of big data technologies, IT spending on computer clusters is increasing rapidly as well. In order to minimize the cost, architects must plan big data clusters with careful evaluation of various design choices. Current capacity planning methods are mostly trial-and-error or high level estimation based. These approaches, however, are far from efficient, especially with the increasing hardware diversity and software stack complexity. In this paper, we present CSMethod, a novel cluster simulation methodology, to facilitate efficient cluster capacity planning, performance evaluation and optimization, before system provisioning. With our proposed methodology, software stacks are simulated by an abstract yet high fidelity model, Hardware activities derived from software operations are dynamically mapped onto architecture models for processors, memory, storage and networking devices. This hardware/software hybrid methodology allows low overhead, fast and accurate cluster simulation that can be easily carried out on a standard client platform (desktop or laptop). Our experimental results with six popular Hadoop workloads demonstrate that CSMethod can achieve an average error rate of less than six percent, across various software parameters and cluster hardware configurations. We also illustrate the application of the proposed methodology with two real-world use cases: Video-streaming service system planning and Terasort cluster optimization. All our experiments are run on a commodity laptop with execution speeds faster than native executions on a multi-node high-end cluster.
Keywords
Big Data; middleware; performance evaluation; video streaming; CSMethod; Hadoop workloads; Terasort cluster optimization; big data clusters; cluster capacity planning; cluster hardware configurations; cluster simulation methodology; computer clusters; hardware diversity; hardware-software hybrid methodology; performance evaluation; software parameters; software stack complexity; system evaluation; system optimization; system provisioning; video-streaming service system planning; Big data; Computational modeling; Computer architecture; Hardware; Program processors; Unified modeling language; big data; cluster simulation; data center capacity planning; performance modeling;
fLanguage
English
Publisher
ieee
Conference_Titel
Parallel Processing (ICPP), 2014 43rd International Conference on
Conference_Location
Minneapolis MN
ISSN
0190-3918
Type
conf
DOI
10.1109/ICPP.2014.48
Filename
6957248
Link To Document