DocumentCode :
2529255
Title :
Toward Predictive Failure Management for Distributed Stream Processing Systems
Author :
Gu, Xiaohui ; Papadimitriou, Spiros ; Yu, Philip S. ; Chang, Shu-Ping
Author_Institution :
North Carolina State Univ., Raleigh, NC
fYear :
2008
fDate :
17-20 June 2008
Firstpage :
825
Lastpage :
832
Abstract :
Distributed stream processing systems (DSPSs) have many important applications such as sensor data analysis, network security, and business intelligence. Failure management is essential for DSPSs that often require highly-available system operations. In this paper, we explore a new predictive failure management approach that employs online failure prediction to achieve more efficient failure management than previous reactive or proactive failure management approaches. We employ light-weight stream-based classification methods to perform online failure forecast. Based on the prediction results, the system can take differentiated failure preventions on abnormal components only. Our failure prediction model is tunable, which can achieve a desired tradeoff between failure penalty reduction and prevention cost based on a user-defined reward function. To achieve low-overhead online learning, we propose adaptive data stream sampling schemes to adaptively adjust measurement sampling rates based on the states of monitored components, and maintain a limited size of historical training data using reservoir sampling. We have implemented an initial prototype of the predictive failure management framework within the IBM System S distributed stream processing system. Experiment results show that our system can achieve more efficient failure management than conventional reactive and proactive approaches, while imposing low overhead to the DSPS.
Keywords :
distributed processing; fault tolerant computing; query processing; sampling methods; IBM System S distributed stream processing systems; adaptive data stream sampling schemes; business intelligence; continuous query processing; failure penalty prevention; failure penalty reduction; light-weight stream-based classification methods; measurement sampling rates; network security; online failure prediction; predictive failure management; reservoir sampling; sensor data analysis; Condition monitoring; Cost function; Data analysis; Data security; Intelligent networks; Intelligent sensors; Predictive models; Sampling methods; Sensor systems and applications; Size measurement; Data Stream Processing; Failure Prediction; Fault Tolerance; System Mining;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Distributed Computing Systems, 2008. ICDCS '08. The 28th International Conference on
Conference_Location :
Beijing
ISSN :
1063-6927
Print_ISBN :
978-0-7695-3172-4
Electronic_ISBN :
1063-6927
Type :
conf
DOI :
10.1109/ICDCS.2008.34
Filename :
4595959
Link To Document :
بازگشت