DocumentCode :
610372
Title :
Robust distributed stream processing
Author :
Chuan Lei ; Rundensteiner, E.A. ; Guttman, Joshua D.
Author_Institution :
Comput. Sci. Dept., Worcester Polytech. Inst., Worcester, MA, USA
fYear :
2013
fDate :
8-12 April 2013
Firstpage :
817
Lastpage :
828
Abstract :
Distributed stream processing systems must function efficiently for data streams that fluctuate in their arrival rates and data distributions. Yet repeated and prohibitively expensive load re-allocation across machines may make these systems ineffective, potentially resulting in data loss or even system failure. To overcome this problem, we instead propose a load distribution (RLD) strategy that is robust to data fluctuations. RLD provides ϵ-optimal query performance under load fluctuations without suffering from the performance penalty caused by load migration. RLD is based on three key strategies. First, we model robust distributed stream processing as a parametric query optimization problem. The notions of robust logical and robust physical plans then are overlays of this parameter space. Second, our Early-terminated Robust Partitioning (ERP) finds a set of robust logical plans, covering the parameter space, while minimizing the number of prohibitively expensive optimizer calls with a probabilistic bound on the space coverage. Third, our OptPrune algorithm maps the space-covering logical solution to a single robust physical plan tolerant to deviations in data statistics that maximizes the parameter space coverage at runtime. Our experimental study using stock market and sensor networks streams demonstrates that our RLD methodology consistently outperforms state-of-the-art solutions in terms of efficiency and effectiveness in highly fluctuating data stream environments.
Keywords :
distributed processing; probability; query processing; resource allocation; statistical analysis; ERP; OptPrune algorithm; RLD methodology; RLD strategy; data distributions; data fluctuations; data loss; data statistics; data stream environments; data streams; distributed stream processing systems; early-terminated robust partitioning; load distribution strategy; load fluctuations; load migration; load re-allocation across machines; optimal query performance; parameter space coverage; parametric query optimization problem; performance penalty; probabilistic bound; prohibitively expensive optimizer calls; robust distributed stream processing; robust logical plans; robust physical plans; sensor networks streams; space-covering logical solution; state-of-the-art solutions; stock market; system failure; Digital signal processing; Partitioning algorithms; Query processing; Robustness; Runtime; Silicon; Uncertainty;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Data Engineering (ICDE), 2013 IEEE 29th International Conference on
Conference_Location :
Brisbane, QLD
ISSN :
1063-6382
Print_ISBN :
978-1-4673-4909-3
Electronic_ISBN :
1063-6382
Type :
conf
DOI :
10.1109/ICDE.2013.6544877
Filename :
6544877
Link To Document :
بازگشت