• DocumentCode
    2079995
  • Title

    Optimizing ETL workflows for fault-tolerance

  • Author

    Simitsis, Alkis ; Wilkinson, Kevin ; Dayal, Umeshwar ; Castellanos, Malu

  • Author_Institution
    HP Labs., Palo Alto, CA, USA
  • fYear
    2010
  • fDate
    1-6 March 2010
  • Firstpage
    385
  • Lastpage
    396
  • Abstract
    Extract-Transform-Load (ETL) processes play an important role in data warehousing. Typically, design work on ETL has focused on performance as the sole metric to make sure that the ETL process finishes within an allocated time window. However, other quality metrics are also important and need to be considered during ETL design. In this paper, we address ETL design for performance plus fault-tolerance and freshness. There are many reasons why an ETL process can fail and a good design needs to guarantee that it can be recovered within the ETL time window. How to make ETL robust to failures is not trivial. There are different strategies that can be used and they each have different costs and benefits. In addition, other metrics can affect the choice of a strategy; e.g., higher freshness reduces the time window for recovery. The design space is too large for informal, ad-hoc approaches. In this paper, we describe our QoX optimizer that considers multiple design strategies and finds an ETL design that satisfies multiple objectives. In particular, we define the optimizer search space, cost functions, and search algorithms. Also, we illustrate its use through several experiments and we show that it produces designs that are very near optimal.
  • Keywords
    data warehouses; fault tolerant computing; matrix algebra; optimisation; ETL design; ETL workflows optimization; QoX optimizer; cost functions; data warehousing; extract-transform-load processes; fault tolerance; optimizer search space; quality metrics; search algorithms; sole metric; Availability; Cost function; Data mining; Data warehouses; Design optimization; Fault tolerance; Maintenance; Robustness; Scalability; Warehousing;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Data Engineering (ICDE), 2010 IEEE 26th International Conference on
  • Conference_Location
    Long Beach, CA
  • Print_ISBN
    978-1-4244-5445-7
  • Electronic_ISBN
    978-1-4244-5444-0
  • Type

    conf

  • DOI
    10.1109/ICDE.2010.5447816
  • Filename
    5447816