Title :
Cache-oblivious scheduling of shared workloads
Author :
Bar, Arian ; Golab, Lukasz ; Ruehrup, Stefan ; Schiavone, Mirko ; Casas, Pedro
Author_Institution :
Telecommun. Res. Center Vienna, Vienna, Austria
Abstract :
Shared workload optimization is feasible if the set of tasks to be executed is known in advance, as is the case in updating a set of materialized views or executing an extract-transform-load workflow. In this paper, we consider data-intensive workloads with precedence constraints arising from data dependencies. While there has been previous work on identifying common subexpressions and task re-ordering to enable shared scans, in this paper we solve the problem of scheduling shared data-intensive workloads in a cache-oblivious way. Our solution relies on a novel formulation of precedence constrained scheduling with the additional constraint that once a data item is in the cache, all tasks that require this item should execute as soon as possible thereafter. We give an optimal algorithm using A* search over the space of possible orderings, and we propose efficient and effective heuristics that obtain nearly-optimal schedules in much less time. We present experimental results on real-life data warehouse workloads and the TCP-DS benchmark to validate our claims.
Keywords :
cache storage; data warehouses; scheduling; search problems; A * search; TCP-DS benchmark; cache-oblivious scheduling; data dependencies; data warehouse workloads; extract-transform-Ioad workflow; heuristics; nearly-optimal schedules; optimal algorithm; ordering space; precedence constrained scheduling; shared data-intensive workload scheduling; shared workload optimization; task set; Bandwidth;
Conference_Titel :
Data Engineering (ICDE), 2015 IEEE 31st International Conference on
Conference_Location :
Seoul
DOI :
10.1109/ICDE.2015.7113339