DocumentCode :
172884
Title :
Improving MapReduce Performance in a Heterogeneous Cloud: A Measurement Study
Author :
Xu Zhao ; Ling Liu ; Qi Zhang ; Xiaoshe Dong
Author_Institution :
Xi´an Jiaotong Univ., Xi´an, China
fYear :
2014
fDate :
June 27 2014-July 2 2014
Firstpage :
400
Lastpage :
407
Abstract :
Hybrid clouds, geo-distributed cloud and continuous upgrades of computing, storage and networking resources in the cloud have driven datacenters evolving towards heterogeneous clusters. Unfortunately, most of MapReduce implementations are designed for homogeneous computing environments and perform poorly in heterogeneous clusters. Although a fair of research efforts have dedicated to improve MapReduce performance, there still lacks of in-depth understanding of the key factors that affect the performance of MapReduce jobs in heterogeneous clusters. In this paper, we present an extensive experimental study on two categories of factors: system configuration and task scheduling. Our measurement study shows that an in-depth understanding of these factors is critical for improving MapReduce performance in a heterogeneous environment. We conclude with five key findings: (1) Early shuffle, though effective for reducing the latency of MapReduce jobs, can impact the performance of map tasks and reduce tasks differently when running on different types of nodes. (2) Two phases in map tasks have different sensitive to input block size and the ratio of sort phase with different block size is different for different type of nodes. (3) Scheduling map or reduce tasks dynamically with node capacity and workload awareness can further enhance the job performance and improve resource consumption efficiency. (4) Although random scheduling of reduce tasks works well in homogeneous clusters, it can significantly degrade the performance in heterogeneous clusters when shuffled data size is large. (5) Phase-aware progress rate estimation and speculation strategy can provide substantial performance gain over the state of art speculation scheduler.
Keywords :
cloud computing; computer centres; parallel programming; MapReduce job latency reduction; MapReduce performance improvement; continuous computing resource up-gradation; continuous networking resource up-gradation; continuous storage resource up-gradation; data centers; dynamic map scheduling; dynamic task reduction; geo-distributed cloud; heterogeneous cloud; heterogeneous clusters; homogeneous clusters; homogeneous computing environments; hybrid clouds; input block size; job performance enhancement; map scheduling; map task performance; node capacity; performance gain; phase-aware progress rate estimation; random scheduling; resource consumption efficiency improvement; shuffled data size; sort phase ratio; speculation scheduler; speculation strategy; system configuration; task scheduling; workload awareness; Benchmark testing; Correlation; Current measurement; Peer-to-peer computing; Performance gain; Resource management; Scheduling;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Cloud Computing (CLOUD), 2014 IEEE 7th International Conference on
Conference_Location :
Anchorage, AK
Print_ISBN :
978-1-4799-5062-1
Type :
conf
DOI :
10.1109/CLOUD.2014.61
Filename :
6973767
Link To Document :
بازگشت