DocumentCode :
2450303
Title :
Improving MapReduce fault tolerance in the cloud
Author :
Zheng, Qin
Author_Institution :
Adv. Comput. Programme, Inst. of High Performance Comput., Singapore, Singapore
fYear :
2010
fDate :
19-23 April 2010
Firstpage :
1
Lastpage :
6
Abstract :
MapReduce has been used at Google, Yahoo, FaceBook etc., even for their production jobs. However, according to a recent study, a single failure on a Hadoop job could cause a 50% increase in completion time. Amazon Elastic MapReduce has been provided to help users perform data-intensive tasks for their applications. These applications may have high fault tolerance and/or tight SLA requirements. However, MapReduce fault tolerance in the cloud is more challenging as topology control and (data) rack locality currently are not possible. In this paper, we investigate how redundent copies can be provisioned for tasks to improve MapReduce fault tolerance in the cloud while reducing latency.
Keywords :
Internet; Web sites; fault tolerant computing; Amazon Elastic MapReduce; Hadoop job; MapReduce fault tolerance; SLA requirement; Availability; Cloud computing; Delay; Disk drives; Facebook; Fault tolerance; High performance computing; Job production systems; Open source software; Topology; MapReduce; backup; fault tolerance; scheduling;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Parallel & Distributed Processing, Workshops and Phd Forum (IPDPSW), 2010 IEEE International Symposium on
Conference_Location :
Atlanta, GA
Print_ISBN :
978-1-4244-6533-0
Type :
conf
DOI :
10.1109/IPDPSW.2010.5470865
Filename :
5470865
Link To Document :
بازگشت