Title :
Data stream partitioning re-optimization based on runtime dependency mining
Author :
Viel, Emeric ; Ueda, Hiroshi
Author_Institution :
Syst. Software Labs., Fujitsu Labs. Ltd., Kawasaki, Japan
fDate :
March 31 2014-April 4 2014
Abstract :
In distributed data stream processing, a program made of multiple queries can be parallelized by partitioning input streams according to the values of specific attributes, or partitioning keys. Applying different partitioning keys to different queries requires re-partitioning intermediary streams, causing extra communication and reduced throughput. Re-partitionings can be avoided by detecting dependencies between the partitioning keys applicable to each query. Existing partitioning optimization methods analyze query syntax at compile-time to detect inter-key dependencies and avoid re-partitionings. This paper extends those compile-time methods by adding a runtime re-optimization step based on the mining of temporal approximate dependencies (TADs) between partitioning keys. A TAD is defined in this paper as a type of dependency that can be approximately valid over a moving time window. Our evaluation, based on a simulation of the Linear Road Benchmark, showed a 94.5% reduction of the extra communication cost.
Keywords :
data handling; distributed processing; optimisation; query processing; TAD; compile time methods; data stream partitioning reoptimization; distributed data stream processing; linear road benchmark; multiple queries; optimization methods; query syntax analysis; runtime dependency mining; temporal approximate dependencies; Accuracy; Data mining; Monitoring; Optimization; Roads; Routing; Runtime; DSMS; data-stream processing; dependency mining; distributed processing; partitioning optimization; temporal approximate dependencies;
Conference_Titel :
Data Engineering Workshops (ICDEW), 2014 IEEE 30th International Conference on
Conference_Location :
Chicago, IL
DOI :
10.1109/ICDEW.2014.6818327