• DocumentCode
    2450303
  • Title

    Improving MapReduce fault tolerance in the cloud

  • Author

    Zheng, Qin

  • Author_Institution
    Adv. Comput. Programme, Inst. of High Performance Comput., Singapore, Singapore
  • fYear
    2010
  • fDate
    19-23 April 2010
  • Firstpage
    1
  • Lastpage
    6
  • Abstract
    MapReduce has been used at Google, Yahoo, FaceBook etc., even for their production jobs. However, according to a recent study, a single failure on a Hadoop job could cause a 50% increase in completion time. Amazon Elastic MapReduce has been provided to help users perform data-intensive tasks for their applications. These applications may have high fault tolerance and/or tight SLA requirements. However, MapReduce fault tolerance in the cloud is more challenging as topology control and (data) rack locality currently are not possible. In this paper, we investigate how redundent copies can be provisioned for tasks to improve MapReduce fault tolerance in the cloud while reducing latency.
  • Keywords
    Internet; Web sites; fault tolerant computing; Amazon Elastic MapReduce; Hadoop job; MapReduce fault tolerance; SLA requirement; Availability; Cloud computing; Delay; Disk drives; Facebook; Fault tolerance; High performance computing; Job production systems; Open source software; Topology; MapReduce; backup; fault tolerance; scheduling;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Parallel & Distributed Processing, Workshops and Phd Forum (IPDPSW), 2010 IEEE International Symposium on
  • Conference_Location
    Atlanta, GA
  • Print_ISBN
    978-1-4244-6533-0
  • Type

    conf

  • DOI
    10.1109/IPDPSW.2010.5470865
  • Filename
    5470865