• DocumentCode
    608041
  • Title

    h-MapReduce: A Framework for Workload Balancing in MapReduce

  • Author

    Martha, V.S. ; Weizhong Zhao ; Xiaowei Xu

  • Author_Institution
    Univ. of Arkansas at Little Rock, Little Rock, AR, USA
  • fYear
    2013
  • fDate
    25-28 March 2013
  • Firstpage
    637
  • Lastpage
    644
  • Abstract
    The big data analytics community has accepted MapReduce as a programming model for processing massive data on distributed systems such as a Hadoop cluster. MapReduce has been evolving to improve its performance. We identified skewed workload among workers in the MapReduce ecosystem. The problem of skewed workload is of serious concern for massive data processing. We tackled the workload balancing issue by introducing a hierarchical MapReduce, or h-MapReduce for short. h-MapReduce identifies a heavy task by a properly defined cost function. The heavy task is divided into child tasks that are distributed among available workers as a new job in MapReduce framework. The invocation of new jobs from a task poses several challenges that are addressed by h-MapReduce. Our experiments on h-MapReduce proved the performance gain over standard MapReduce for data-intensive algorithms. More specifically, the increase of the performance gain is exponential in terms of the size of the networks. In addition to the exponential performance gains, our investigations also found a negative effect of deploying h-MapReduce due to an inappropriate definition of heavy tasks, which provides us a guideline for an effective application of h-MapReduce.
  • Keywords
    parallel programming; resource allocation; Hadoop cluster; MapReduce programming model; big data analytics community; cost function; data intensive algorithm; distributed systems; h-MapReduce framework; hierarchical MapReduce; performance gain; skewed workload; workload balancing; Clustering algorithms; Cost function; Ecosystems; Programming; Social network services; Standards; System recovery; MapReduce; hierarchical MapReduce; workload balancing;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Advanced Information Networking and Applications (AINA), 2013 IEEE 27th International Conference on
  • Conference_Location
    Barcelona
  • ISSN
    1550-445X
  • Print_ISBN
    978-1-4673-5550-6
  • Electronic_ISBN
    1550-445X
  • Type

    conf

  • DOI
    10.1109/AINA.2013.48
  • Filename
    6531814