DocumentCode
710126
Title
Cache-oblivious scheduling of shared workloads
Author
Bar, Arian ; Golab, Lukasz ; Ruehrup, Stefan ; Schiavone, Mirko ; Casas, Pedro
Author_Institution
Telecommun. Res. Center Vienna, Vienna, Austria
fYear
2015
fDate
13-17 April 2015
Firstpage
855
Lastpage
866
Abstract
Shared workload optimization is feasible if the set of tasks to be executed is known in advance, as is the case in updating a set of materialized views or executing an extract-transform-load workflow. In this paper, we consider data-intensive workloads with precedence constraints arising from data dependencies. While there has been previous work on identifying common subexpressions and task re-ordering to enable shared scans, in this paper we solve the problem of scheduling shared data-intensive workloads in a cache-oblivious way. Our solution relies on a novel formulation of precedence constrained scheduling with the additional constraint that once a data item is in the cache, all tasks that require this item should execute as soon as possible thereafter. We give an optimal algorithm using A* search over the space of possible orderings, and we propose efficient and effective heuristics that obtain nearly-optimal schedules in much less time. We present experimental results on real-life data warehouse workloads and the TCP-DS benchmark to validate our claims.
Keywords
cache storage; data warehouses; scheduling; search problems; A * search; TCP-DS benchmark; cache-oblivious scheduling; data dependencies; data warehouse workloads; extract-transform-Ioad workflow; heuristics; nearly-optimal schedules; optimal algorithm; ordering space; precedence constrained scheduling; shared data-intensive workload scheduling; shared workload optimization; task set; Bandwidth;
fLanguage
English
Publisher
ieee
Conference_Titel
Data Engineering (ICDE), 2015 IEEE 31st International Conference on
Conference_Location
Seoul
Type
conf
DOI
10.1109/ICDE.2015.7113339
Filename
7113339
Link To Document