• DocumentCode
    2054188
  • Title

    Improving MapReduce Performance via Heterogeneity-Load-Aware Partition Function

  • Author

    Sun, Huifeng ; Chen, Junliang ; Liu, Chuanchang ; Zheng, Zibin ; Yu, Nan ; Yang, Zhi

  • Author_Institution
    State Key Lab. of Networking & Switching Technol., Beijing Univ. of Posts & Telecommun., Beijing, China
  • fYear
    2011
  • fDate
    26-30 Sept. 2011
  • Firstpage
    557
  • Lastpage
    560
  • Abstract
    MapReduce is an important programming model for large-scale data-intensive applications such as web indexing, scientific simulation, and data mining. Hadoop is an open-source implementation of MapReduce enjoying wide adoption. Partition function is an important component of Hadoop which split outputs of maps into bulks that place the input data of reduces. Based on the assumptions that cluster nodes are homogeneous and perform work at roughly the same rate, its default partition function splits intermediate keys into reduces. However, in practice the homogeneity assumptions seldom hold and cluster nodes usually perform work at different rate. In this paper, we design a heterogeneity-load-aware partition function named proportional partition function (PPF). Besides the dynamic loading of cluster nodes, PPF considers the capacity diversity of cluster nodes such as CPU processing speed and disk writing speed.
  • Keywords
    distributed processing; public domain software; CPU processing speed; Hadoop; MapReduce performance improvement; Web indexing; data mining; disk writing speed; dynamic loading; heterogeneity-load-aware partition function; large-scale data-intensive application; open-source implementation; programming model; proportional partition function; scientific simulation; Computers; Data mining; Degradation; Dynamic scheduling; Educational institutions; Loading; Hadoop; MapReduce; erogeneity; partition function; scheduling;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Cluster Computing (CLUSTER), 2011 IEEE International Conference on
  • Conference_Location
    Austin, TX
  • Print_ISBN
    978-1-4577-1355-2
  • Electronic_ISBN
    978-0-7695-4516-5
  • Type

    conf

  • DOI
    10.1109/CLUSTER.2011.68
  • Filename
    6061207