• DocumentCode
    56820
  • Title

    Impact of MapReduce Policies on Job Completion Reliability and Job Energy Consumption

  • Author

    Jia-Chun Lin ; Fang-Yie Leu ; Ying-ping Chen

  • Author_Institution
    Dept. of Comput. Sci., Nat. Chiao Tung Univ., Hsinchu, Taiwan
  • Volume
    26
  • Issue
    5
  • fYear
    2015
  • fDate
    May 1 2015
  • Firstpage
    1364
  • Lastpage
    1378
  • Abstract
    Recently, MapReduce has been widely employed by many companies/organizations to tackle data-intensive problems over a large-scale MapReduce cluster. To solve machine/node failure which is inevitable in a MapReduce cluster, MapReduce employs several policies, such as input-data replication and intermediate-data replication policies. To speed up job execution, MapReduce allows reduce tasks to early fetch their required intermediate data. However, the impact of these policy combinations on the job completion reliability (JCR for short) and job energy consumption (JEC for short) of a MapReduce cluster was not clear, where JCR is the reliability with which a MapReduce job can be completed by the cluster, whereas JEC is the energy consumed by the cluster to complete the job. Therefore, in this study, we analyze the JCR and JEC of a MapReduce cluster on four policy combinations (POCs for short) derived from two typical intermediate-data replication policies and two typical reduce-task assignment policies. The four POCs are further compared in extensive scenarios, which not only consider jobs at different scales with various parameters, but also give a MapReduce cluster two extreme parallel execution capabilities and diverse bandwidths. The analytical results enable MapReduce managers to comprehend how these POCs impact the JCR and JEC of a cluster and then select an appropriate POC based on the characteristics of their own MapReduce jobs and clusters.
  • Keywords
    energy consumption; parallel programming; resource allocation; MapReduce policy; input-data replication policy; intermediate-data replication policy; job completion reliability; job energy consumption; reduce-task assignment policy; Bandwidth; Computer science; Delays; Distributed databases; Energy consumption; Power demand; Reliability; Intermediate-data replication; MapReduce; intermediate-data replication; job completion reliability; job energy consumption;
  • fLanguage
    English
  • Journal_Title
    Parallel and Distributed Systems, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    1045-9219
  • Type

    jour

  • DOI
    10.1109/TPDS.2014.2374600
  • Filename
    6966761