• DocumentCode
    3172223
  • Title

    Grey-Box Approach for Performance Prediction in Map-Reduce Based Platforms

  • Author

    Kadirvel, Selvi ; Fortes, José A B

  • Author_Institution
    Adv. Comput. & Inf. Syst. Lab., Univ. of Florida, Gainesville, FL, USA
  • fYear
    2012
  • fDate
    July 30 2012-Aug. 2 2012
  • Firstpage
    1
  • Lastpage
    9
  • Abstract
    Map-Reduce has become an important paradigm for data-intensive computations. The ability to estimate Map-reduce application performance is critical for efficient resource scheduling and provisioning both on dedicated clusters and on the cloud. Current state-of-the-art techniques for performance prediction of Map- Reduce applications use analytical and simulation-based models. In this paper, we make the case for performance prediction using regression techniques based on machine-learning. Through modeling the Map-Reduce environment as a grey-box, we can leverage a combination of externally observed system features and information about sub-system internals. We identify four learning techniques with high prediction accuracy through a detailed comparative study of twenty methods. The powerful capabilities of data analytics platforms are usually accompanied by frequent faults that occur due to scale, complexity and the use of commercial off- the-shelf components. We show that our proposed approach can effectively predict degraded performance under these faulty conditions by the inclusion of additional fault-related input features. A mean prediction error of <;12% was achieved across the range of parameters studied on a 64-node Xen virtualized environment running an open-source Map-Reduce implementation, Hadoop.
  • Keywords
    cloud computing; data analysis; grey systems; learning (artificial intelligence); scheduling; virtual reality; Hadoop; Xen virtualized environment; cloud; commercial off-the-shelf components; data analytics; data-intensive computations; dedicated clusters; grey-box approach; machine learning; map-reduce based platforms; performance prediction; resource scheduling; Computational modeling; Fault tolerance; Fault tolerant systems; Mathematical model; Middleware; Predictive models; Virtual machining;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Computer Communications and Networks (ICCCN), 2012 21st International Conference on
  • Conference_Location
    Munich
  • Print_ISBN
    978-1-4673-1543-2
  • Type

    conf

  • DOI
    10.1109/ICCCN.2012.6289311
  • Filename
    6289311