• DocumentCode
    3434962
  • Title

    Cluster-Size Scaling and MapReduce Execution Times

  • Author

    Fan Zhang ; Sakr, Majd

  • Author_Institution
    Massachusetts Inst. of Technol., Albany, MA, USA
  • Volume
    1
  • fYear
    2013
  • fDate
    2-5 Dec. 2013
  • Firstpage
    240
  • Lastpage
    249
  • Abstract
    Understanding performance scalability in MapReduce applications presents a challenging problem. The difficulty lies in the distributed locations of input data and the distributed compute resources that utilize varied network substrates. User-defined Map and Reduce stages, with numerous application parameters, further complicate the problem. Using small datasets and limited test runs to understand how MapReduce applications will behave with "big data" can have a significant payoff. In this paper, we evaluate the impact of cluster-size scaling on execution time for a set of Map- and Reduce-intensive applications. We model the MapReduce framework, specify conditions and implications of power-law conformity, and verify our model with data from benchmark MapReduce applications. Empirical results indicate that: (1) within a range of scaling parameters, MapReduce execution times follow a power-law distribution. (2) Power-law scalability for Map-intensive applications starts from a small cluster size. (3) Shuffle-intensive applications exhibit power-law behavior starting from larger clusters. (4) Cluster-scaling performance gains fail to show power-law behavior when computing resources far exceed those needed. Our findings will facilitate using small-scale test runs to allocate and configure virtual and physical computing resources in large scale clouds.
  • Keywords
    cloud computing; parallel programming; power aware computing; virtual machines; Big data; MapReduce execution time; benchmark MapReduce applications; cluster-size scaling; distributed compute resources; distributed locations; large scale clouds; model verification; performance scalability; physical computing resources; power-law conformity; power-law distribution; power-law scalability; virtual computing resources; Analytical models; Bandwidth; Benchmark testing; Computational modeling; Lead; Mathematical model; Scalability; Cluster scaling; Large-scale over-provisioning; MapReduce applications; Power-law distribution; Small-scale limitation;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Cloud Computing Technology and Science (CloudCom), 2013 IEEE 5th International Conference on
  • Conference_Location
    Bristol
  • Type

    conf

  • DOI
    10.1109/CloudCom.2013.39
  • Filename
    6753804