• DocumentCode
    650575
  • Title

    Using a Tunable Knob for Reducing Makespan of MapReduce Jobs in a Hadoop Cluster

  • Author

    Yi Yao ; Jiayin Wang ; Bo Sheng ; Ningfang Mi

  • Author_Institution
    Northeastern Univ., Boston, MA, USA
  • fYear
    2013
  • fDate
    June 28 2013-July 3 2013
  • Firstpage
    1
  • Lastpage
    8
  • Abstract
    The MapReduce framework and its open source implementation Hadoop have become the defacto platform for scalable analysis on large data sets in recent years. One of the primary concerns in Hadoop is how to minimize the completion length (i.e., makespan) of a set of MapReduce jobs. The current Hadoop only allows static slot configuration, i.e., fixed numbers of map slots and reduce slots throughout the lifetime of a cluster. However, we found that such a static configuration may lead to low system resource utilizations as well as long completion length. Motivated by this, we propose a simple yet effective scheme which uses slot ratio between map and reduce tasks as a tunable knob for reducing the makespan of a given set. By leveraging the workload information of recently completed jobs, our scheme dynamically allocates resources (or slots) to map and reduce tasks. We implemented the presented scheme in Hadoop V0.20.2 and evaluated it with representative MapReduce benchmarks at Amazon EC2. The experimental results demonstrate the effectiveness and robustness of our scheme under both simple workloads and more complex mixed workloads.
  • Keywords
    parallel processing; pattern clustering; public domain software; resource allocation; Amazon EC2; Hadoop V0.20.2; Hadoop cluster; MapReduce benchmarks; MapReduce framework; MapReduce jobs; cluster lifetime; completion length; defacto platform; map slots; open source implementation; reduce slots; resource allocation; static configuration; static slot configuration; system resource utilizations; tunable knob; workload information; Algorithm design and analysis; Clustering algorithms; Educational institutions; Estimation; Heuristic algorithms; Optimized production technology; Resource management;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Cloud Computing (CLOUD), 2013 IEEE Sixth International Conference on
  • Conference_Location
    Santa Clara, CA
  • Print_ISBN
    978-0-7695-5028-2
  • Type

    conf

  • DOI
    10.1109/CLOUD.2013.140
  • Filename
    6676671