• DocumentCode
    573544
  • Title

    Same Queries, Different Data: Can We Predict Runtime Performance?

  • Author

    Popescu, Adrian Daniel ; Ercegovac, Vuk ; Balmin, Andrey ; Branco, Miguel ; Ailamaki, Anastasia

  • Author_Institution
    Ecole Polytech. Fed. de Lausanne, Lausanne, Switzerland
  • fYear
    2012
  • fDate
    1-5 April 2012
  • Firstpage
    275
  • Lastpage
    280
  • Abstract
    We consider MapReduce workloads that are produced by analytics applications. In contrast to ad hoc query workloads, analytics applications are comprised of fixed data flows that are run over newly arriving data sets or on different portions of an existing data set. Examples of such workloads include document analysis/indexing, social media analytics, and ETL (Extract Transform Load). Motivated by these workloads, we propose a technique that predicts the runtime performance for a fixed set of queries running over varying input data sets. Our prediction technique splits each query into several segments where each segment´s performance is estimated using machine learning models. These per-segment estimates are plugged into a global analytical model to predict the overall query runtime. Our approach uses minimal statistics about the input data sets (e.g., tuple size, cardinality), which are complemented with historical information about prior query executions (e.g., execution time). We analyze the accuracy of predictions for several segment granularities on both standard analytical benchmarks such as TPC-DS [17], and on several real workloads. We obtain less than 25% prediction errors for 90% of predictions.
  • Keywords
    data analysis; learning (artificial intelligence); query processing; statistics; ETL; MapReduce workloads; TPC-DS; ad hoc query workloads; analytics applications; data flows; data sets; document analysis; document indexing; extract transform load; global analytical model; machine learning models; minimal statistics; query executions; query runtime performance prediction; segment granularity prediction; social media analytics; Analytical models; Computational modeling; Context; Data models; Estimation; Predictive models; Runtime;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Data Engineering Workshops (ICDEW), 2012 IEEE 28th International Conference on
  • Conference_Location
    Arlington, VA
  • Print_ISBN
    978-1-4673-1640-8
  • Type

    conf

  • DOI
    10.1109/ICDEW.2012.66
  • Filename
    6313693