• DocumentCode
    9263
  • Title

    CRESP: Towards Optimal Resource Provisioning for MapReduce Computing in Public Clouds

  • Author

    Keke Chen ; Powers, Jacob ; Shumin Guo ; Fengguang Tian

  • Author_Institution
    Dept. of Comput. Sci. & Eng., Wright State Univ., Dayton, OH, USA
  • Volume
    25
  • Issue
    6
  • fYear
    2014
  • fDate
    Jun-14
  • Firstpage
    1403
  • Lastpage
    1412
  • Abstract
    Running MapReduce programs in the cloud introduces this unique problem: how to optimize resource provisioning to minimize the monetary cost or job finish time for a specific job? We study the whole process of MapReduce processing and build up a cost function that explicitly models the relationship among the time cost, the amount of input data, the available system resources (Map and Reduce slots), and the complexity of the Reduce function for the target MapReduce job. The model parameters can be learned from test runs. Based on this cost function, we can solve a number of decision problems, such as the optimal amount of resources that can minimize monetary cost within a job finish deadline, minimize time cost under a certain monetary budget, or find the optimal tradeoffs between time and monetary costs. Experimental results show that the proposed approach performs well on a number of sample MapReduce programs in both the in-house cluster and Amazon EC2. We also conducted a variance analysis on different components of the MapReduce workflow to show the possible sources of modeling error. Our optimization results show that with the proposed approach we can save a significant amount of time and money, compared to randomly selected settings.
  • Keywords
    cloud computing; parallel programming; Amazon EC2; CRESP; MapReduce computing; MapReduce workflow; cost function; in-house cluster; monetary cost; optimal resource provisioning; public clouds; variance analysis; Analytical models; Cloud computing; Complexity theory; Cost function; Data models; Mathematical model; MapReduce; cloud computing; performance modeling; resource provisioning;
  • fLanguage
    English
  • Journal_Title
    Parallel and Distributed Systems, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    1045-9219
  • Type

    jour

  • DOI
    10.1109/TPDS.2013.297
  • Filename
    6678508