DocumentCode :
2181243
Title :
On Improving Fault Tolerance for Heterogeneous Hadoop MapReduce Clusters
Author :
Chi-Yi Lin ; Ting-Hau Chen ; Yi-No Cheng
Author_Institution :
Dept. Comput. Sci. & Inf. Eng., Tamkang Univ., Taipei, Taiwan
fYear :
2013
fDate :
16-19 Dec. 2013
Firstpage :
38
Lastpage :
43
Abstract :
The computing paradigm of MapReduce has gained extreme popularity in the area of large-scale data-intensive applications in recent years. Hadoop, an open-source implementation of MapReduce, can be set up easily and rapidly on commodity hardware to form a massive computing cluster. In such a cluster, task failures and node failures are not an anomaly, which will cause a substantial impact on Hadoop´s performance. Although Hadoop can restart failed tasks automatically and compensate for slow tasks by enabling speculative execution, many researchers have identified the shortcomings of Hadoop´s fault tolerance. In this research, we try to improve them by designing a simple check pointing mechanism for Map tasks, and using a revised criterion for identifying slow tasks. Specifically, our check pointing mechanism saves the partial output produced by the Mappers, and our criterion for identifying slow tasks considers tasks with variable progress rates. By preliminary simulations, although the results show only marginal performance improvement compared with native Hadoop and the LATE scheduler, we believe that our approaches have the potential to offer greater performance gain on real workloads.
Keywords :
checkpointing; fault tolerant computing; parallel processing; public domain software; software performance evaluation; Hadoop fault tolerance; Hadoop performance; LATE scheduler; Map tasks; checkpointing mechanism; commodity hardware; computing cluster; heterogeneous Hadoop MapReduce clusters; large-scale data-intensive applications; node failures; open-source MapReduce implementation; speculative execution; task failures; Abstracts; Checkpointing; Cloud computing; Data models; Dynamic scheduling; Google; MapReduce; checkpointing; heterogeneous environments; intermediate data; speculative execution;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Cloud Computing and Big Data (CloudCom-Asia), 2013 International Conference on
Conference_Location :
Fuzhou
Print_ISBN :
978-1-4799-2829-3
Type :
conf
DOI :
10.1109/CLOUDCOM-ASIA.2013.83
Filename :
6820971
Link To Document :
بازگشت