• DocumentCode
    160652
  • Title

    Impact of MapReduce Task Re-execution Policy on Job Completion Reliability and Job Completion Time

  • Author

    Jia-Chun Lin ; Fang-Yie Leu ; Ying-ping Chen ; Munawar, Waqaas

  • Author_Institution
    Dept. of Comput. Sci., Nat. Chiao Tung Univ., Hsinchu, Taiwan
  • fYear
    2014
  • fDate
    13-16 May 2014
  • Firstpage
    712
  • Lastpage
    718
  • Abstract
    MapReduce has been a worldwide accepted framework for solving data-intensive applications. To prevent MapReduce jobs from being interrupted by node failures which occur frequently in a large-scale MapReduce cluster, current MapReduce implementations, e.g., Hadoop, employ a task re-execution policy (TR policy for short) for MapReduce jobs, i.e., when a map/reduce task of a job fails due to node failure, this policy reperforms the task on another node. However, the impact of the TR policy on job completion reliability and job completion time have not been studied from a theoretical viewpoint, especially when the job is given different characteristics, e.g., different input data sizes, different numbers of reduce tasks, and different intermediate data sizes. In this study, we derive the job completion reliability (JCR for short) of a MapReduce job based on Poisson distributions and analyze the expected job completion time (JCT for short) based on the universal generation function. We use nine settings of task re-execution factor (TR factor for short) to explore the impact of the TR policy on the JCR and JCT of jobs. The results show that the TR policy can effectively improve JCR without significantly prolonging JCT. But there is no single TR factor with which all jobs can achieve a high JCR.
  • Keywords
    Poisson distribution; computer network reliability; distributed processing; MapReduce job; MapReduce task reexecution policy; Poisson distribution; data intensive application; job completion reliability; job completion time; node failure; task reexecution factor; universal generation function; Educational institutions; Google; Programming; Reliability theory; Servers; MapReduce; Poisson distribution; job completion reliability; job completion time; universal generation function;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Advanced Information Networking and Applications (AINA), 2014 IEEE 28th International Conference on
  • Conference_Location
    Victoria, BC
  • ISSN
    1550-445X
  • Print_ISBN
    978-1-4799-3629-8
  • Type

    conf

  • DOI
    10.1109/AINA.2014.87
  • Filename
    6838734