Title of article :
Scheduling strategies for efficient ETL execution
Author/Authors :
Anastasios Karagiannis، نويسنده , , Panos Vassiliadis، نويسنده , , Alkis Simitsis، نويسنده ,
Issue Information :
روزنامه با شماره پیاپی سال 2013
Abstract :
Extract-transform-load (ETL) workflows model the population of enterprise data warehouses with information gathered from a large variety of heterogeneous data sources. ETL workflows are complex design structures that run under strict performance requirements and their optimization is crucial for satisfying business objectives. In this paper, we deal with the problem of scheduling the execution of ETL activities (a.k.a. transformations, tasks, operations), with the goal of minimizing ETL execution time and allocated memory. We investigate the effects of four scheduling policies on different flow structures and configurations and experimentally show that the use of different scheduling policies may improve ETL performance in terms of memory consumption and execution time. First, we examine a simple, fair scheduling policy. Then, we study the pros and cons of two other policies: the first opts for emptying the largest input queue of the flow and the second for activating the operation (a.k.a. activity) with the maximum tuple consumption rate. Finally, we examine a fourth policy that combines the advantages of the latter two in synergy with flow parallelization.
Keywords :
record linkage , Entity resolution , Data matching , Data Quality , Privacy techniques , Survey
Journal title :
Information Systems
Journal title :
Information Systems